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Preface 


The first edition of Physical Biochemistry: Principles and Applications set out to describe the physical basis and some 
examples of applications of key physically-based techniques used in Biochemistry and other areas of molecular life science 
research. In the last decade there has been a noticeable renaissance in some traditional techniques such as X-ray diffraction, 
ultracentrifugation and electrophoresis in a variety of formats. In the same time-frame ‘hyphenated’ techniques (e.g. LC- 
tandem MS) have become much more mainstream and some instrumentation has become available in desktop formats making 
quite sophisticated analysis possible even in the nonspecialist lab. The emphasis was on a largely nonmathematical treatment 
at a level appropriate to students in the penultimate year of a Biochemistry course with a view to making these techniques 
comprehensible and accessible to a level intermediate between a general Biochemistry textbook and a specialist text. The 
feedback I have had from many readers is that this goal was largely achieved. 

My task with this second edition was to retain as much as possible of the description of physical principles whilst updating 
and integrating new material into a reasonably compact volume. The first edition was strongly influenced by the effects 
of the then-recent completion of the Human and other large-scale genome sequencing projects. At the time Proteomics 
and other ‘-omics’ technologies were still relatively new paradigms and bioinformatics approaches largely the province of 
the specialist. It is remarkable how quickly these technologies have now become embedded in many areas of biochemical 
research so that they are now perceived as part of the mainstream. In the second edition I have dedicated new chapters to 
proteomics and bioinformatics, respectively, to reflect this changed situation and to emphasize how interconnected physical 
and computational techniques have now become. 

The second edition required choices to be made and these are inevitably influenced by one’s perception of what is needed 
and useful for students to know at the start of their scientific journey into molecular life science. In making these choices 
I have continued to be guided by what I perceive to be the most generally-used and helpful techniques but I accept that 
specialists in one or more technique may disagree. 

In preparing this second edition I had help from many of the colleagues listed in the preface to the First Edition. 
In addition, I must thank Dr Rebecca Green, School of Pharmaceutical Sciences, University of Nottingham, UK, for her 
invaluable comments on surface plasmon resonance (Chapter 3). I am also indebted to the excellent staff at Wiley’s especially 
my editors, Celia Carden and Fiona Woods for their unfailing encouragement and understanding. However, any errors in the 
text are my own. I earnestly hope that the reader will find something interesting and thought-provoking in this volume and 
be encouraged to explore these very powerful approaches in their work. 


Prof David Sheehan 
University College Cork 
June 2008 


Chapter 1 


Introduction 


Objectives 


* Quantification of physical phenomena. 
e Objectives of this volume. 


After completing this chapter you should be familiar with: 


¢ The special chemical conditions often required by biomolecules. 
e Importance of the use of buffers in the study of physical phenomena in biochemistry. 


This volume describes a range of physical techniques which 
are now widely used in the study both of biomolecules 
and of processes in which they are involved. There will 
be a strong emphasis throughout on biomacromolecules 
such as proteins and nucleic acids as well as on macro- 
molecular complexes of which they are components (e.g. 
biological membranes, ribosomes, chromosomes). This is 
because such chemical entities are particularly crucial to the 
correct functioning of living cells and present specific an- 
alytical problems compared to simpler biomolecules such 
as monosaccharides or dipeptides. Biophysical techniques, 
give detailed information offering insights into the structure, 
dynamics and interactions of biomacromolecules. 

Life scientists in general and biochemists in particular 
have devoted much effort during the last century to eluci- 
dation of the relationship between structure and function 
and to understanding how biological processes happen and 
are controlled. Major progress has been made using chem- 
ical and biological techniques which, for example, have 
contributed to the development of the science of molecular 
biology. However, in the last decade physical techniques 
which complement these other approaches have seen major 
development and these now promise even greater insight 
into the molecules and processes which allow the living 
cell to survive. For example, a major focus of life sci- 
ence research currently is the proteome as distinct from the 
genome. This has emphasized the need to be able to study 
the highly-individual structures of biomacromolecules such 
as proteins to understand more fully their particular contri- 
bution to the biology of the cell. For the foreseeable future, 
these techniques are likely to impact to a greater or lesser 
extent on the activities of most life scientists. This text at- 
tempts to survey the main physical techniques and to de- 
scribe how they can contribute to our knowledge of biolog- 
ical systems and processes. We will set the scene for this by 


first looking at the particular analytical problems posed by 
biomolecules. 


1.1 SPECIAL CHEMICAL 
REQUIREMENTS OF BIOMOLECULES 


The tens of thousands of biomolecules encountered in living 
cells may be classified into two general groups. Biomacro- 
molecules (e.g. proteins; nucleic acids) are characterized by 
high molecular mass (denoted throughout this text as rela- 
tive molecular mass, M,) and are generally unstable under 
extreme chemical conditions where they may lose structure 
or break down into their chemical building blocks. Low 
molecular weight molecules are smaller and more chem- 
ically robust (e.g. amino acids; nucleotides; fatty acids). 
Within each group there is displayed a wide range of water- 
solubility, chemical composition and reactivity which is de- 
termined by complex interactions between physicochemical 
attributes of the biomolecule and solvent. These attributes 
are the main focus for the techniques described in this vol- 
ume and reflect the highly individual function which each 
molecule performs in the cell (Tables 1.1 and 1.2). 
Notwithstanding the great range of form and structure, 
we can nonetheless recognize certain attributes as common 
to all biomolecules. The first and most obvious is that 
all of these molecules are produced in living cells under 
mild chemical conditions of temperature, pressure and pH. 
Biomacromolecules are built up from simpler building 
block molecules by covalent bonds formed usually with the 
elimination of water. Moreover, biomolecules are continu- 
ously synthesized and degraded in cells ina highly regulated 
manner. It follows from this that many biomolecules are es- 
pecially sensitive to extremes of temperature and pH which 
may present a problem in their handling prior to and during 


Physical Biochemistry: Principles and Applications, Second Edition David Sheehan 


© 2009 John Wiley & Sons, Ltd 


2 PHYSICAL BIOCHEMISTRY 


Table 1.1. Some important physical attributes of biomolecules 
amenable to study by biophysical techniques 


Physical attribute Technique 


Mass MS (Chapters 4 and 10) 

Electrophoresis (Chapters 5 
and 10) 

Gel filtration (Chapter 2) 

Gel filtration (Chapter 2) 

Centrifugation (Chapter 7) 

Pulsed field gel electrophoresis 
(Chapter 5) 

Ion exchange chromatography 
(Chapter 2) 

Electrophoresis/chromatofocusing 
(Chapters 5 and 10) 

MS (Chapters 4 and 10) 

Chromatography (Chapter 2) 

Electrophoresis (Chapters 5 
and 10) 

Crystallization (Chapter 6) 

Centrifugation (Chapter 7) 

Spectroscopy (Chapter 3) 

Biocalorimetry (Chapter 8) 


Volume/density 


Charge 


Shape 


Energy 


Table 1.2. Some important chemical attributes of biomolecules 
which may be used in study by biophysical techniques 


Chemical attribute Technique 


Composition MS (Chapters 4 and 10) 
Spectroscopy (Chapter 3) 
Spectroscopy (Chapter 3) 
Crystallization (Chapter 6) 
MS (Chapters 4 and 10) 
Bioinformatics (Chapter 9) 
Electrophoresis (Chapters 5 
and 10) 
Chromatography (Chapter 2) 
Electrophoresis (Chapters 5 
and 10) 
Spectroscopy (Chapter 3) 
Chromatography (Chapter 2) 
Electrophoresis (Chapter 5) 
Proteomics (Chapter 10) 
Chromatography (Chapter 2) 
Crystallography (Chapter 6) 
Precipitation (Chapter 7) 
Electrophoresis (Chapter 5) 
MS (Chapters 4 and 10) 
Spectroscopy (Chapter 3) 
Crystallography (Chapter 6) 


Molecular structure 


Covalent bonds 


Noncovalent bonds 


Native/denatured structure 


Solubility 


Complex formation 


any biophysical analysis. Since biomolecules result 
from a long process of biological evolution during which 
they have been selected to perform highly specific func- 
tions, a very close relationship has arisen between chemical 
structure and function. This means that, even at pH and 
temperature values under which the molecule may not be 
destroyed, it may function suboptimally or not at all. 

These facts impose limitations on the chemical conditions 
to which biomolecules may be exposed during extraction, 
purification or analysis. In most of the techniques described 
in this volume, sample analytes are exposed to a specific 
set of chemical conditions by being dissolved in a solution 
of defined composition. Whilst other components in the 
solution may also be important in individual cases as will 
be discussed below (Section 1.3.2), three main variables 
govern the makeup of this solution which are discussed in 
more detail in the following section. 


1.2 FACTORS AFFECTING ANALYTE 
STRUCTURE AND STABILITY 


In practice, most of the biophysical procedures described in 
this volume use conditions which have been optimized over 
many years for thousands of different samples. These robust 
conditions will normally maintain the sample in a defined 
structural form facilitating its separation and/or analysis. 
However, some procedures (e.g. chromatography, capillary 
electrophoresis, crystallization) may require case-by-case 
optimization of conditions. Before embarking on a detailed 
analysis of a biomolecule using biophysical techniques it is 
often useful to know something about the stability of the 
sample to chemical variables, especially pH, temperature 
and solvent polarity. This knowledge can help us to design 
a suitable solvent or set of chemical conditions which will 
maximize the stability of the analyte for the duration of 
the experiment and may also help us to explain unexpected 
results. For example, we sometimes find loss of enzyme ac- 
tivity during column chromatography which may be partly 
explained by the chemical conditions experienced by the 
protein during the experiment. Moreover, many of the tech- 
niques described in this volume are actually designed to 
be suboptimal and to take advantage of disruption of the 
normal functional structure of the biomolecule to facilitate 
separation or analysis (e.g. electrophoresis, HPLC, MS). 

A good indication of the most stabilizing conditions may 
often be obtained from knowledge of the biological origin 
of the biomolecule. It is also wise to assess the structural 
and functional stability of the analyte over the range of ex- 
perimental conditions encountered in the experiment during 
its likely time-span. 


We can distinguish two main types of effects as a result of 
variation in the chemical conditions to which biomolecules 
are exposed. Structural effects reflect often irreversible 
structural change in the molecule (e.g. protein/nucleic acid 
denaturation; hydrolysis of covalent bonds between build- 
ing blocks of which biopolymers are composed). Functional 
effects are frequently more subtle and may be reversible 
(e.g. deprotonation of chemical groups in the biomolecule 
resulting in ionization; partial unfolding of proteins). A 
detailed treatment of these effects on the main classes of 
biomolecules is outside the scope of the present volume but 
a working knowledge of the likely effects of these conditions 
can be very useful in deciding conditions for separation or 
analytical manipulation. 


1.2.1 pH Effects 


pH is defined as the negative log of the proton concentration: 
pH = — log[H*] (1.1) 


Because both the H* and OH™ concentrations of pure 
water are 1077 M, this scale runs from a maximum of 14 
(strongly alkaline) to a minimum of 0 (strongly acidic). 
As it is a log scale, one unit reflects a 10-fold change in 
proton concentration. Most biomacromolecules are labile to 
alkaline or acid-catalyzed hydrolysis at extremes of the pH 
scale but are generally stable in the range 3-10. It is usual 
to analyse such biopolymers at pH values where they are 
structurally stable and this may differ slightly for individual 
biopolymers. For example, proteins normally expressed in 
lysosomes (pH 4) are quite acid-stable while those from 
cytosol (pH 7) may be unstable near pH 3. Aqueous solutions 
in which sample molecules are dissolved usually comprise 
a buffer to prevent changes in pH during the experiment. 
These are described in more detail in Section 1.3 below. 

Many biomolecules are amphoteric in aqueous solution 
that is they can accept or donate protons. Some chemical 
groups such as inorganic phosphate or acidic amino acid 
side-chains (e.g. aspartate) can act as Bronsted acids and 
donate protons: 


AH aA aH (1.2) 
= 


Other groups such as the imidazole ring of histidine or 
amino groups can act as Brgnsted bases and accept protons: 


B- +H* -=> BH (1.3) 


The position of equilibrium in these protonation/ 
deprotonation events may be described by an equilibrium 
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constant, K,: 


K=% -ALH (1.4) 
kı [AH] 


pk, (—logK,) is the pH value at which 50% of 
the acid is protonated and 50% is deprotonated. The 
Henderson—Hasselbach equation describes variation of con- 
centrations of A~ and AH as a function of pH: 


log[A7] 


H = pK, 
PH = pKa + AH] 


(1.5) 


Functional groups present as structural components of 
biomolecules (e.g. amino acid side-chains of proteins; phos- 
phate groups of nucleotides) will have distinct K, values 
which may differ slightly from the value found in other 
chemical circumstances (e.g. the Ka values of amino acid 
side-chains in polypeptides differ from those in the free 
amino acid). Some biomolecules can contain both acidic 
and basic groups within their structure (e.g. proteins) while 
particular chemical structures found in biomolecules may be 
polyprotic, that is capable of multiple ionizations (e.g. phos- 
phate). Such biomolecules may undergo a complex pattern 
of ionization resulting in varying net charge on the molecule. 
pH titration curves for biomolecules allow us to identify pK, 
values (Figure 1.1). 

Since protonation-deprotonation effects are responsible 
for the charges on biomacromolecules which maintain their 
solubility in water, their solubility is often lowest at their 
isoelectric point, pI, the pH value at which the molecule 
has no net charge. These can also be determined by titra- 
tion using methods described in Chapter 5 (Section 5.5.3; 
Figure 5.24). 

While the pH scale reflects the situation in aqueous 
solution, many microenvironments encountered in living 
cells are quite nonpolar (see below). Good examples in- 
clude biological membranes and water-excluding regions 
of proteins (e.g. some enzyme active sites). In these envi- 
ronments, protonation/deprotonation properties of chemical 
groups may deviate widely from those observed in aqueous 
solution. For example, catalytic residues of many enzymes 
frequently display pK, values which are perturbed far from 
those normal for that residue in water-exposed regions of 
proteins. 


1.2.2 Temperature Effects 


Three main effects of temperature on biomolecules are im- 
portant for the biophysical techniques described in this vol- 
ume. These are effects on structure, chemical reactivity 
and solubility. Heat can disrupt noncovalent bonds such as 
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Figure 1.1. pH titration curves, (a) Lysine. Four protonation states are possible for lysine as shown. Three pK, values are evident from this 
at pH values of 2.18, 8.95 and 10.53. The pI of lysine is 9.74. (b) Glycine. Note that only three protonation states exist for glycine compared 
to four for lysine. pK, values are at pH values of 2.34 and 9.6 which differ slightly from the corresponding ionisations in (a). Glycine has a 


pl of 5.97 


hydrogen bonds which are especially important in the struc- 
ture of biomacromolecules. This can lead to denaturation 
of proteins and DNA or to disruption of multimolecular 
complexes in which they may be involved. Moreover, since 
covalent bonds linking building block molecules (e.g. pep- 
tide bonds; glycosidic bonds; 3’, 5’-phosphodiester bonds) 


have generally lower bond energies than bonds within such 
building blocks, extensive heating can result in disintegra- 
tion of the covalent structure of biomacromolecules. Thus 
proteins can break down into component peptides or nu- 
cleic acids into smaller polynucleotide fragments as a result 
of exposure to heat. 


Secondly, all chemical reaction processes obey the Ar- 
rhenius relationship: 


k = Ae—E,/RT (1.6) 


where k is the rate constant for the process, A is a con- 
stant, E, is the activation energy of the reaction, R is the 
Universal gas constant and T is absolute temperature. This 
relationship arises from large changes in the number of acti- 
vated molecules available for reaction as a result of change 
in temperature. The exponential dependence on temperature 
means that small changes in T can result in large effects on 
the rate constant, k. 

Thirdly, temperature usually increases the solubility of 
molecules in a solvent as well as the rate of diffusion through 
the solvent. This is because heat increases the average ki- 
netic energy of solvent molecules. In the case of water, this 
is accompanied by extensive breakdown of water-water hy- 
drogen bonds which increases the solute capacity of a given 
volume of water. Thus, for example 8M urea is soluble 
at 30°C while the limit of solubility is closer to 5M at 
4°C. Kinetic energy effects are also important in situations 
involving biological membranes because the phospholipid 
bilayer of which they are composed becomes increasingly 
fluid at higher temperature. 

Temperature is therefore usually tightly controlled dur- 
ing biophysical experiments. In dealing with biomacro- 
molecules in particular, it is generally not possible to use 
temperatures higher than 80°C and, in most cases, much 
lower temperatures are used. Moreover, samples such as 
proteins or nucleic acids are normally stored under refriger- 
ated conditions to maximize their stability. This is achieved 
with the aid of liquid nitrogen (—196 °C) or with refrigera- 
tors set at —80 or —20°C. Particular care must be taken in 
handling crude biological extracts since hydrolases such as 
proteases and nucleases present in these will be active in the 
range 18-37 °C and result in extensive degradation of pro- 
teins and nucleic acids. This can be avoided by maintaining 
low temperatures near 4°C during manipulation of sample 
and by cooling buffer solutions before dissolving biological 
samples. 

Most biomolecules are optimally active at temperatures 
similar to those experienced in the biological source from 
which they were obtained. For example, proteins from ther- 
mophilic bacteria are especially heat stable compared to cor- 
responding proteins from mesophilic bacteria while mam- 
malian proteins are optimally active around 37 °C. 


1.2.3 Effects of Solvent Polarity 


Polarity arises from unequal affinity of atoms bonded to- 
gether for shared electrons called electronegativity. Apart 
from fluorine (with an electronegativity value of 4), 
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oxygen is the most electronegative element in the periodic 
table (electronegativity value of 3.5) leading, for example, 
to oxygens of -OH groups tending to be partially negatively 
charged (87) while hydrogens tend to be partially positively 
charged (6+). Water (which is itself polar) interacts ioni- 
cally with polar functional groups such as -OH by hydrogen 
bonding (Figure 1.2). Organic molecules composed mainly 
of carbon and hydrogen, however, tend to be nonpolar as 
these atoms have similar electronegativities (2.5 and 2.1, re- 
spectively). In general, polar biomolecules dissolve readily 
in polar solvents such as water while those which are non- 
polar dissolve in nonpolar solvents (e.g. trichloromethane). 

Biomolecules lacking strongly electronegative elements 
such as oxygen and nitrogen and consisting mainly of car- 
bon and hydrogen tend to be principally nonpolar (e.g. fatty 
acids; sterols; integral membrane proteins). Conversely, 
those containing oxygen, sulfur and nitrogen tend to 
be mainly polar (e.g. monosaccharides; nucleotides). 
Biomacromolecules often contain distinct structural regions 
some of which may be polar while others may be nonpolar. 

Since water is the main biological solvent, most 
biomolecules (or parts of biomolecules) have been selected 
by evolution to interact with it in particular ways either by at- 
traction or repulsion. Polar regions strongly attracted to wa- 
ter are called hydrophilic while nonpolar regions which are 
repulsed by water are called hydrophobic. In the living cell, 
biomolecules adopt a structure determined to a large degree 
by the extent to which they are hydrophobic/hydrophilic. 
For example, biological membranes are made up of phos- 
pholipid bilayers which spontaneously form when phos- 
pholipid molecules are dissolved in water. The polar heads 
of phospholipids are on the exterior in contact with water 
while the nonpolar fatty acid components are on the interior 
of the bilayer protected from water. Cytosolic proteins ex- 
press hydrophilic groups on their surface whilst folding in 
such a manner that hydrophobic groups are protected from 
exposure to water in the interior of the protein (Chapter 6). 
Membrane-bound proteins such as hormone receptors ex- 
pose hydrophobic groups to the interior of biological mem- 
branes and hydrophilic groups to the exterior. 

In extracting, analyzing and purifying biomolecules these 
intricate structural interactions are often lost which can re- 
sult in aggregation, precipitation or loss of structure and, 
hence, of biological activity. If it is desired to retain bio- 
logical activity we use aqueous solutions to handle largely 
hydrophilic biomolecules, nonpolar solvents to dissolve 
mainly hydrophobic samples and detergent solutions for 
molecules which possess both classes of groups. Many of 
the individual techniques described in this volume use spe- 
cific solvent systems of distinct polarity/nonpolarity but it 
may occasionally be necessary to design individual solvent 
systems to take account of the requirements of particular 
biomolecules. Examples include column chromatography 
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Figure 1.2. Hydrogen bonding. Water molecules hydrogen bond because of partial charge-differences arising from the different electroneg- 
ativities of oxygen and hydrogen. Hydrogen bonds (dashed lines) are shown both in bulk water and between water and amino acid side-chains 
of glutamic acid and serine residues of a polypcptidc. Such ionic interactions maintain a solvation shell of water around the surface of 


globular proteins and other hydrated biomacromolecules 


(Chapter 2), spectroscopy (Chapter 3) and capillary elec- 
trophoresis (Chapter 5). 


1.3 BUFFERING SYSTEMS USED IN 
BIOCHEMISTRY 


A buffer is an aqueous solvent system designed to maintain 
a given pH. In the context of biochemical work, the main 
function of buffers is to resist any tendency for pH to rise 
or drop during the experiment. This can happen during any 
process which might release or absorb protons from solution 
such as, for example, during an enzyme-catalyzed reaction 
or as a result of electrochemical processes such as elec- 
trophoresis. A secondary but often crucial role for a buffer 
is to maximize the stability of biomolecules in solution. 
Frequently, additional molecules are dissolved in the buffer 
to help it to do this and these are discussed in more detail 
below. 


1.3.1 How Does a Buffer Work? 


Any aqueous solution containing both A~ and AH (Section 
1.2 above) is, in principle, capable of resisting change in 
pH. This is because, if protons are generated in the solution, 
they can be neutralized by A7: 


A` +H* > AH a7 


Conversely, if alkali is generated in the solution (which 
would tend to remove protons), it can be neutralized by AH: 


AH+OH — A +H0O (1.8) 


In practice, most buffers consist of mixtures either of a 
weak acid and its salt or of a weak base and its salt. 

Of course, the ability of a buffer to resist change in pH 
is finite, especially if the number of protons involved is 
especially large. This limit is represented by the buffering 
capacity of the buffer, 8. This is defined as the number of 
moles of [H*] which must be added to a liter of the buffer 


to decrease the pH by one unit. It can be mathematically 
calculated from the following equation: 


g = 23: Ka- IHL ICI 
(K+ [H+ 


where [C] is the sum of the concentrations of A~ and AH. 
This relationship means that buffering capacity increases 
with buffer concentration so that, for example 100mM 
acetate buffer has 50-fold greater buffering capacity than 
2 mM. It can be demonstrated experimentally that B reaches 
a maximum at pH values equal to pK, which is when the 
concentrations of A~ and AH are approximately equal. This 
means that buffers work best at pH values around their pKa. 
In practice, most buffers are effective one pH unit above 
and one below their pK, so, for example acetate buffers 
(pK, = 4.8) are useful in the pH range 3.8 to 5.8, although 
most effective around pH 4.8. 


(1.9) 


1.3.2 Some Common Buffers 


A selection of buffers commonly used in biochemistry is 
given in Table 1.3. Some of these buffer components are 
of biological origin (e.g. glycine; histidine; acetate). Good’s 
buffers were developed by N.E. Good to facilitate buffering 
in the pH range 6—10.5. Because of their complicated chem- 
ical names, these buffers are more usually known by abbre- 
viations (e.g. Pipes; Hepes; Mops). Inspection of Table 1.3 
shows that buffers are available which span the range of 
interest for physical studies of biomolecules (i.e. pH 3-10). 
When preparing buffers, it is essential that both the concen- 
tration and pH are correct since these are the two variables 
critical to buffering capacity (Equation (1.7)). 

A number of problems can arise with particular buffers 
which can limit their use in specific cases. For example, 
several buffers interact with divalent metals (e.g. phosphate 
binds Ca?*; Tris reacts with Cu?* and Ca?*) and should be 
avoided in cases where this is important to interpretation of 
the experiment. Tris buffers are especially sensitive to tem- 
perature which can result in the same buffer giving a slightly 
different pH at different temperatures. Phosphate buffers are 
particularly susceptible to bacterial contamination if stored 
for long periods of time (although this can be avoided by 
including alow concentration of sodium azide as a preserva- 
tive). Some buffer components (e.g. EDTA) may give high 
absorbance readings which can affect detection during pro- 
cesses such as chromatography. Volatile buffer components 
(e.g. formic acid; bicarbonate; triethanolamine) can be lost 
from the buffer over time leading to a gradual change in pH 
and buffering capacity. 


1.3.3 Additional Components Often Used in Buffers 


In addition to buffer components such as weak acids/bases 
and their salts, buffers frequently contain a range of other 
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Table 1.3. Some common buffers used in biochemistry 


Buffer pK, 
Phosphate“ 0.85, 1.82, 6.68 
Histidine“ 1.82, 5.98, 9.17 
Phosphoric acid“ 1.96, 6.7, 12.3 
Formic acid 3.75 

Barbituric acid 3.98 

Acetic acid 4.8 

Pyridine 5.23 


Bis TRIS [Bis-(2-hydroxy-ethyl)imino- 6.46 
tris-(hydroxy-methy])-methane] 


PIPES [1,4-piperazinebis- 6.8 
(ethanesulphonic acid)] 

Imidazole 7.0 

BES [N, N-Bis(2-hydroxy-ethyl)-2- 7.15 
amino-ethane-sulphonic acid] 

MOPS [2-(N-morpholino)propane- 1:2 


sulphonic acid] 
HEPES [N-2-hydroxyethyl-piperazine- 7.55 
N’-2-ethane-sulphonic acid] 
TRIS [hydroxymethyl)amino-methane] 8.1 
TAPS [W-tris(hydroxymethyl)methyl-2- 8.4 
aminopropane sulphonic acid] 
Boric acid 9.39 
Ethanolamine 9.44 
CAPS [3-(cyclohexylamino)-l-propane- 10.4 
sulphonic acid] 


Methylamine 10.64 
Dimelhylamine 10.75 
Diethylamine 10.98 


“There are polyprotic with several pK, values. 


components of which a selection is shown in Table 1.4. 
These may be necessary to maintain stability of the 
biomolecule, to control levels of metal ions, to ensure re- 
ducing/oxidizing conditions or to keep the biomolecule dis- 
solved and/or denatured. We will observe that some of the 
agents tabulated in Table 1.4 are used in more than one of 
the techniques described in this book and therefore repre- 
sent generally useful tools for the manipulation of chemical 
conditions to which biomolecules are exposed. 


1.4 QUANTITATION, UNITS AND 
DATA HANDLING 


1.4.1 Units Used in the Text 


Physical measurements usually result in quantification of 
some property of a molecule or system such as those tab- 
ulated in Table 1.1. Various systems of internationally- 
agreed units have been used historically to record these 
measurements but, throughout this book, the Systeme Inter- 
nationale (SI) system, the most currently-agreed scientific 
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Table 1.4. Additional reagents sometimes added to buffers 


Chemical 


Function 


2-Mercaptoethanol 

Dithiothreitol 

Sodium borohydride 

Divalent metals plus O2 

Performic acid 

Leupeptin 

Phenylmethy] sulphony] 
fluoride (PMSF) 

Ethylene diamine NNN'N' 
tetra-acetic acid (EDTA) 


Ethylene glycol-bis(6- 
aminoethyl ether) NNN‘N’ 
tetraacetic acid (EGTA) 

Urea 

Guanidinium hydrochloride 

Sodium dodecy] sulphate (SDS) 

Cetyltrimethyl ammonium 


Reducing agent 
Reducing agent 
Reducing agent 
Oxidising agents 
Oxidising agent 
Protease inhibitor 
Serine protease 
inhibitor 
Metal chelator/ 
metalloprotease 
inhibitor 
Calcium chelator 


Denaturing agent 
Denaturing agent 
Anionic detergent 
Cationic detergent 


chloride 
3-(3- cholamidopropyl)dimethy 
lammonio)-1-propane 


Zwitterionic detergent 


sulphonate (CHAPS) 
Triton X-100 Nonionic detergent 
Digitonin Nonionic detergent 


Protamine K Binds DNA 


quantification system will be used. The main units are tabu- 
lated in Appendix 1. This system has the advantage of great 
internal consistency which removes any need for conver- 
sion factors. For example, units of distance (meters) readily 
relate to units of velocity (meters/second). 

In addition to SI units, some measurements used are op- 
erational measurements commonly employed in the life sci- 
ence literature. These are units which have gained wide ac- 
ceptance in the international science community but which 
are not strictly part of the SI system. Relative molecular mass 
(M,) is expressed as Daltons (Da) or multiples thereof (e.g. 
kDa) with 12 Da being equivalent to twelve atomic mass 
units (i.e. the mass of !?C). In the case of nucleic acids, 
base pairs (bp) or multiples thereof (e.g. kbp, Mbp) are used 
as units of mass. Interatomic distances such as bond-lengths 
are generally given as angstroms (A) with 1A corresponding 
to 107!° m (i.e. 0.1 nm). 

Concentrations are given mainly in molarity (1 M solution 
being Avogadro’s Number of molecules dissolved in 11 of 
solvent) although occasionally they are expressed as per- 
centages of weight/weight (% w/w) or weight/volume 
(% wiv). Thus, 10% (w/v) would represent a solution of 
10 g per 100 ml while 10% (v/v) would represent a solution 
of 10 ml per 100 ml. 


The most commonly used temperature scale in the text is 
the Celsius scale although absolute temperatures (in units of 
Kelvin, K) are specifically referred to by T (e.g. Equation 
(1.6) above). 


1.4.2 Quantification of Protein and 

Biological Activity 

Most of the techniques described in this text are used to sep- 
arate or analyse biomolecules or mixtures containing them. 
In carrying out this kind of experimentation it is crucial 
to know exactly how much sample is being applied since 
most of the systems described are highly loading-sensitive. 
In the case of pure samples this may not be a problem but 
many samples encountered in biochemistry may be quite 
crude and heterogeneous. A common strategy for quantify- 
ing such samples is to estimate their protein content (e.g. 
by the Bradford method, Figure 3.18, or by one of the other 
methods mentioned in the bibliography at the end of this 
chapter) and to load a standard amount of protein. Since 
the ratio of protein to the other components of the mixture 
is fixed, this normally ensures uniform loading. Similarly, 
when quantifying the biological activity of a sample (e.g. 
enzyme activity, antibody content, antiviral activity) it is of- 
ten useful to express this as specific activity that is units/mg 
protein. This is a measure which is independent both of 
sample volume and sample concentration. 

The majority of the approaches described measure rela- 
tive properties of biomolecules rather than absolute proper- 
ties. Examples of this would include M, estimation by mass 
spectrometry, gel filtration and electrophoresis, pI estima- 
tion by isoelectric focusing, secondary structure estimation 
of proteins by circular dichroism and determination of chem- 
ical shifts in NMR spectroscopy. For this reason, a common 
strategy found in many of the techniques is to compare the 
sample being analysed to a series of well-characterized stan- 
dard molecules using well-established procedures which 
have been optimized for that particular method. It is impor- 
tant to understand that measurements obtained in this way 
are therefore highly dependent on standard measurements 
being of good quality and that this may vary somewhat from 
method to method. 


1.5 THE WORLDWIDE WEB AS A 
RESOURCE IN PHYSICAL 
BIOCHEMISTRY 


1.5.1 The Worldwide Web 


The worldwide web was originally devised as a distributed 
computer network for the military capable of withstanding 
nuclear attack! In the last decade, it has grown to include 


millions of individual entries called web pages containing 
information on almost every conceivable subject. These web 
pages exist on acomputer somewhere in the world but can in 
principle be accessed by other computers through the web. 
At the time of writing, (March, 2008) it is estimated that 20% 
(1320 million) of the Earth’s 6606 million population have 
access to the web (http://www.internetworldstats.com/stats). 
We can connect to web pages on the web with an appropriate 
browser suchas Internet Explorer. However, due to the sheer 
mass of material being constantly added to and changed on 
the web, we normally use a search engine to find pages 
on specific named topics. Google is a good example of a 
general purpose search engine. It should be remembered 
that no search engine gives 100% coverage so results from 
a search could represent as few as 25% of the total possible 
pages on a given topic. 

The ‘address’ of a particular web page is given by a uni- 
form resource locator (URL). Examples of URLs include 
http://www.google.com for google and http://alta-vista.;com 
for altavista. The prefix http:// is to tell the receiving com- 
puter that it can expect a communication in hypertext trans- 
fer protocol — the most common format allowing one com- 
puter to communicate with another. The rest of the URL 
defines a location, that is a computer containing the relevant 
file. The ending .html which often occurs in URLs signi- 
fies hypertext markup language, the language in which web 
pages are written. 


1.5.2 Web-Based Resources for Physical 
Biochemistry 


The web provides several resources of use in Physical Bio- 
chemistry. Individual web pages are available which de- 
scribe various experimental techniques thus complementing 
published work such as review articles and textbooks. There 
are also databases which are archives of one particular cat- 
egory of information. Examples would include sequence 
databases, databases of NMR spectra, the three dimensional 
structure database and databases of two-dimensional elec- 
trophoresis patterns. The best databases are curated (i.e. 
they are looked after and regularly updated by some rep- 
utable body) and they are annotated (which means each 
entry contains extra information such as literature citations, 
references to other related entries, etc.). These features make 
databases part of the daily life of modern molecular life sci- 
entists. Even though many resources on the web are not 
peer-reviewed in the way that say research articles are, most 
authoritative databases achieve the same result by main- 
taining a close link with the peer-reviewed literature. Con- 
versely, it is becoming increasingly common for research 
articles to be submitted to journals in electronic format and 
for peer-reviewed articles to appear on the web long be- 
fore the paper version. A third set of very useful resources 
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on the web is made possible by the availability, often as 
freeware (i.e. for free!) of computer programs which help 
us to analyse or represent our data differently. Examples 
would include graphics programs which allow us to view 
the three dimensional structures of biomacromolecules en- 
coded in protein databank (PDB) files (Chapters 6 and 9) 
or hydropathy plots which allow us to identify hydrophobic 
regions of amino acid sequences (Chapter 9). 

The ever-closer links between molecular life sciences 
and information technology (IT) is represented in the rel- 
atively new discipline of bioinformatics which is introduced 
in Chapter 9. In this book relevant URLs for web-based 
resources are given at the end of each chapter. 

In addition to text and programs, the wordwide web can 
be searched for images or videos using the standard search 
engines. A word of caution about using this type of search- 
ing in an academic context. The fact that material is on 
the web does not absolve us as scientists from respecting 
copyright law so permission should always be obtained to 
reproduce images, text or videos obtained from the web just 
as we would in using such content from a published source. 
Secondly, we should always take care to refer back to the 
primary literature as this is the bedrock of modern science 
and is likely to remain so as long as rigorous peer-review 
prevails. 


1.6 OBJECTIVES OF THIS VOLUME 


All of the techniques mentioned in this book merit one or 
more volumes to describe fully their potential for the future 
of life science research. In the bibliography at the end of each 
chapter the reader will find a list of such specialist texts and it 
is hoped that the present book will act as a general introduc- 
tion to specialist biophysical techniques. In addition, recent 
review articles are cited which will bring the reader more 
up-to-date on specific applications of individual techniques. 
It is not the intention of the text to supplant such special- 
ist literature but rather to guide students towards a greater 
understanding of the potential of biophysical approaches to 
biochemistry. 

A chapter is devoted to each technique or group of tech- 
niques which describes the physical basis, advantages, lim- 
itations and opportunities it offers. This is presented in a 
generally nonmathematical way to maximize its accessibil- 
ity (more detailed treatments may be found in the special- 
ist texts). Moreover, the relationship between techniques is 
strongly emphasized because several combinations of indi- 
vidual techniques often offer advantages over single experi- 
mental approaches. In particular, recent advances have seen 
the combination of techniques such as mass spectrometry, 
chromatography, electrophoresis and spectroscopy as hy- 
phenated or multi-dimensional analytical techniques. Care 
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has also been taken to emphasize how biophysical ap- 
proaches often complement biological and chemical exper- 
imentation to give a fuller understanding of biochemical 
systems. 

Specific examples of applications of the approaches de- 
scribed are given in boxes throughout the text. These are 
meant to give a flavour of their versatility and power for the 
solving of many different types of problems in biochem- 
istry. The bibliography contains many more examples such 
as, for example, applications in clinical laboratories and in 
industry. Articles and books (e.g. laboratory manuals) con- 
taining practical hints to novices contemplating using these 
techniques are also cited. 

Finally, it is hoped that this book will furnish the student 
with sufficient understanding to allow them to understand 
and grasp as-yet undeveloped biophysical approaches which 
may appear in the next decade or so by noticing the common 
factors underlying the methods described as well as their 
diversity. 
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Chapter 2 
Chromatography 


Objectives 


¢ The physical basis of chromatography. 


of interest. 


After completing this chapter you should be familiar with: 


¢ The chemical basis of the principal chromatography methods used in biochemistry. 

e Performance criteria which can be used to compare chromatography systems. 

e The range of different chromatography formats used in biochemistry. 

e How one might approach design of a purification protocol, for example to purify a specific protein 


Living cells contain hundreds of thousands of distinct chem- 
ical species. These include large molecules like proteins, 
nucleic acids, lipids as well as lower molecular weight 
molecules which act as building blocks for biopolymers or as 
components of complex metabolic pathways. Some of these 
molecules are present in only trace amounts (e.g. interme- 
diates in enzyme mechanisms) whilst others are present in 
abundance (e.g. structural proteins). Moreover, some com- 
ponents are present only at certain stages of the cell-cycle, 
whilst others are present at approximately constant levels. 
Study of individual chemical components of cells can, there- 
fore, give us an insight into many fundamental cellular pro- 
cesses and help us to understand the dynamics of cell com- 
position and function. 

One approach to the study of individual chemical species 
is to separate them from each other by analytical or prepara- 
tive chromatography. Originally, this technique was used 
by Tswett (1903) in the separation of plant pigments 
(chromatography comes from the Greek, chroma, meaning 
colour) but we now know that it is applicable to all chemical 
species, whether coloured or not. Because of the large range 
of size, shape and hydrophobicity found in biomolecules, it 
is to be expected that no one chromatography technique will 
suffice for all separations. In this chapter, the basic princi- 
ples of chromatography will be described to explain why 
different molecules are separable. Some examples of the 
main chromatographic techniques used in Biochemistry are 
then given to illustrate how biomolecules are separated in 
practice. 


2.1 PRINCIPLES OF 
CHROMATOGRAPHY 


2.1.1 The Partition Coefficient 


When applied to any two-phase system (e.g. liquid-liquid, 
liquid—solid), a molecule may partition between the phases 
(Figure 2.1). The precise ratio of concentration achieved 
is ultimately determined by inherent thermodynamic prop- 
erties of the molecule (in turn, a function of its chemical 
structure) and of the phases. In the case of a liquid—liquid 
system, the relative solubility of the molecule in each liq- 
uid will be very important in determining partitioning. In 
a liquid—solid system, different sample molecules may ad- 
sorb to varying degrees on the solid phase. Both partition 
and adsorption phenomena are possible in a column sys- 
tem and this is called column chromatography. In column 
chromatography, one phase is maintained stationary (the sta- 
tionary phase) while the other (the mobile phase) may flow 
freely over it. We can express the concentration ratio in such 
a system as the partition coefficient, K: 


(2.1) 


where C, and Cm are the sample concentrations in the sta- 
tionary and mobile phases, respectively. When a mixture 
made up of several components is applied to such a two- 
phase system, each component will have its own individual 
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Figure 2.1. Partitioning of biomolecules in a two-phase system. 
Two components are represented by circles and squares, respec- 
tively. The two phases could be an aqueous buffer and a solid 
stationary phase. The two samples have very different partition 
coefficients in this experimental system. 


partition coefficient. As a result, each will interact slightly 
differently with the stationary phase and, because of differ- 
ent partitioning between phases, will migrate through the 
column at different rates. Since K will be directly affected 
by the precise experimental conditions (e.g. temperature, 
solvent polarity) the chromatographer may vary these to op- 
timize separation. In column chromatography, we therefore 
exploit what are often tiny differences in the partitioning 
behaviour of sample molecules to achieve their efficient 
separation. 


2.1.2 Phase Systems Used in Biochemistry 


In chromatographic systems used in biochemistry the sta- 
tionary phase is made up of solid particles or of solid 
particles coated with liquid. In the former case, chemi- 
cal groups are often covalently attached to the particles 
and this is called bonded phase liquid chromatography. In 


the latter case, a liquid phase may coat the particle and 
be attached by noncovalent, physical attraction. This type 
of system is called liquid-liquid phase chromatography. A 
good example of liquid-liquid phase chromatography is sil- 
ica coated with a nonpolar hydrocarbon (e.g. C-18 reverse 
phase chromatography; Section 2.4.3). Commonly, the par- 
ticles are composed of hydrated polymers such as cellulose 
or agarose. Such particles may be immobilized in a column 
(Section 2.3.1) and washed with mobile phase. They offer 
good flow characteristics and possess sufficient mechanical 
strength and chemical inertness for the chromatography of 
biomolecules. Because biomolecules have evolved to func- 
tion in an aqueous environment, it is usually necessary to 
use aqueous buffers as the mobile phase if we require the 
molecule to retain its native structure (e.g. in the purification 
of active enzymes). If the native structure is not required, 
however, then it is possible to use more ‘nonbiological’ 
conditions such as organic solvents (e.g. in purification of 
peptides by reverse phase chromatography; Section 2.4.3). 

Liquid-solid or liquid-liquid phases are the most com- 
mon phase systems used in biochemistry. However, in spe- 
cialized situations other phases may be used. For example, 
gas-solid and gas-liquid phases are used in gas chromatog- 
raphy (GC; Section 2.1.4). Regardless of the precise phase 
composition, chromatographic separation is a direct result 
of the different K values of each sample component. 


2.1.3 Liquid Chromatography 


To minimize loss of biological activity, separations are often 
carried out in aqueous buffers below room temperature. Low 
temperatures are especially important in the chromatogra- 
phy of cell extracts during, for example, protein purifica- 
tion. This reduces protease activity which might otherwise 
destroy the protein of interest. Chromatography with liquid 
mobile phases is called liquid chromatography (LC). 

LC uses an experimental system outlined in Figure 2.2. 
Separation takes place in a column which contains the sta- 
tionary phase. The volume and shape of the column will 
depend on the amount of sample to be separated and on the 
mode of chromatography to be used. Buffer is stored in a 
reservoir and is pumped through tubing onto the column. 
Appropriate valves allow the convenient injection of sample 
into this flow or the formation of gradients with a second 
buffer if required. The stationary phase is packed in the col- 
umn and, as the sample passes through the bed of station- 
ary phase, separation occurs. In partition chromatography 
modes, the sample separates into individual components as 
it passes through the stationary phase (e.g. gel filtration; 
Section 2.4.2). In adsorption chromatography modes, how- 
ever, it is necessary first to load the entire sample and later 
to fractionate it. A good example of adsorption chromatog- 
raphy is ion exchange chromatography where the sample 
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Figure 2.2. A typical liquid chromatography system. The direction of flow is shown by arrows. Sample is loaded via injection through a 
valve. If a large number of samples are required an autosampler may be used to reload the column repetitively after each chromatography. 


components are eluted by means of a gradient of competing 
salt counterions after all the sample has been loaded onto 
and washed completely through the column (Sections 2.3.4 
and 2.4.1). Separated sample components elute from the 
column and are detected. A typical LC separation is shown 
in Figure 2.3. 


2.1.4 Gas Chromatography 


Many biomolecules are sensitive to high temperatures that 
can lead to destruction of structure and function (see Section 
1.2.2). However, some molecules of importance in biochem- 
istry may be converted to derivatives that are structurally sta- 
ble, though volatile, in the temperature range 200-250 °C. 
Good examples are trimethylsilated sterols and carbohy- 
drates (esterified at their hydroxyl groups), and methylated 
esters of fatty acids. A second category of molecules (e.g. 
ethanol) is stable and volatile at somewhat lower tem- 
peratures without derivatization. Both of these groups of 


molecules may be analysed by gas chromatography (GC, 
also sometimes called gas-liquid chromatography; GLC). 
This is a form of partition chromatography in which a 
volatile sample is carried in an inert gas mobile phase (the 
carrier gas) such as nitrogen, helium or argon and applied to 
a narrow column (0.1 to 0.5 cm diameter) ranging in length 
from one to thirty meters that contains a liquid stationary 
phase. Sample is introduced to the system by injection from 
a syringe through a rubber septum into an injection port. 
This port is held approx. 10°C above the column tempera- 
ture to ensure efficient volatilization. The column is located 
in an oven that maintains a high temperature. Control of the 
temperature in this oven is crucial to successful chromatog- 
raphy and gradients of temperature may be used to achieve 
good separation. Because the sample components partition 
differently between the phases and because the column is 
so long, they separate efficiently from each other. The col- 
umn may either be packed with solid particles coated with 
a liquid phase or it may be a capillary column in which the 
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Figure 2.3. A typical liquid chromatography experiment. This 
shows separation of a group of isoenzymes by ion exchange chro- 
matography. A 0-100 mM NaCl gradient (dashed line) is used with 
detection at 280nm. This wavelength is often used to detect pro- 
teins. Note that peak 1 contains unbound material while proteins in 
peaks 2 to 4 are bound with progressively greater affinity. 


inner wall of the column itself is coated with liquid. The 
latter have much smaller inner diameters (e.g. 25 um) and 
are generally longer (10-100 m) than packed columns. 

Sample components may be detected using flame ioniza- 
tion, flame photometry or thermal conductivity detectors. 
In flame photometry detectors, the sample is combusted to 
produce electrons and other ions that produce an electric 
current while in flame photometry detectors the change in 
conductivity of a wire is altered by the sample. As with the 
injection port, detectors are held approx. 10°C above the 
temperature of the column. If the separated sample com- 
ponents are to be collected for further analysis (or for an 
interfaced method such as mass spectrometry; see Section 
4.4.3), the carrier gas flow may be split before it enters the 
flame ionization detector. 

Varying column temperature, gas flow and type of column 
may optimize GC separations. 


2.2 PERFORMANCE PARAMETERS 
USED IN CHROMATOGRAPHY 


Varied types (e.g. GC, LC) and modes (e.g. gel filtration, ion 
exchange) of chromatography are used in biochemistry. It 
is therefore very useful to be able to compare and quantify 
separations so that we can optimize them for a particular 
purpose. For example, in an analytical HPLC separation, we 
might wish to replace a particular column with one from a 
different manufacturer or with a slightly different stationary 


phase. The question would then arise, how effective is the 
new column compared to the old one? The yardsticks that 
allow us to make such comparisons we will call performance 
parameters. 


2.2.1 Retention 


When chromatography is carried out as described in Fig- 
ures 2.2 and 2.3, the various sample components separate 
on the column because they are retained to different extents. 
We can describe this retention in terms of time (taking zero 
as the time of sample injection) or volume (calculated from 
volume at sample injection). These are called retention time 
(tr) and retention volume, (Vp) respectively (see Figure 2.4). 
We can interconvert these terms using Equation (2.2): 


Wa f-t (2.2) 


where fp is the retention time (min), Vp the retention volume 
(ml) and f is the flow rate (ml/min). Under a single set of 
experimental conditions (temperature, column composition 
and dimensions, etc.), each component of the sample will 
have a characteristic retention (i.e. tg or Vp) that can often be 
used to identify it in different samples. In general, retention 
is proportional to the length of the column used and is in- 
versely proportional to the flow rate. Moreover, because the 
response of the detector usually relates to the concentration 
of the sample components, we can quantify each component 
by measuring their peak areas. 
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Figure 2.4. Retention in column chromatography. A typical chro- 
matography trace showing the separation of two components; 1 and 
2. The retention volumes of the components are shown by V, and 
V2, respectively, while the base peak widths are denoted by W, and 
W2. The void volume is denoted by Vo. 


Sample components move through the column more 
slowly than does mobile phase. We can therefore define 
a retardation factor (R.F.) for each component as follows: 


Vsample 


R.F. = (2.3) 


Vmobile phase 


where V denotes rate of movement. 

Even if the sample did not adsorb onto or partition into 
the stationary phase, it would take a finite amount of time 
before it eluted from the column. The volume required for 
the injected solvent merely to pass, unretained, through the 
column and to be eluted is called the void or excluded vol- 
ume (Vo, see Figure 2.4). This is the contained volume of the 
column, plus that of injection valve and tubing, minus the 
volume of the polymeric component of the column pack- 
ing. Retention volumes greater than Vo are said to occur in 
the included volume and it is here that the sample may be 
fractionated. 


2.2.2 Resolution 


A sample chromatographic separation of two components of 
a mixture, 1 and 2, is shown in Figure 2.4. As the two peaks 
are ‘resolved’ from each other, we use the term resolution, 
R, to describe their separation. This important performance 
parameter in chromatography allows us to compare how 
effective a given column is in separating a mixture. R has 
a definite mathematical meaning (Equation (2.4)) that may 
be calculated from the trace shown in Figure 2.4: 


V- Vi 

= ——— (2.4) 
(Wi + W2)/2 
W is the base width of the peak of each sample component 
of retention volume, V. It is clear from this equation that 
the best resolution chromatographies are those in which the 
peaks are narrow (i.e. small W) and well separated (e.g. 
large V — Vj). 


2.2.3 Physical Basis of Peak Broadening 


If a sample of a single component is applied to a chromatog- 
raphy system in a small volume, it might be expected to elute 
in a very sharp peak of similar volume. This is not usually 
observed, however, due to a phenomenon called band broad- 
ening, that is the sample elutes in a much larger volume than 
that in which it was applied. The considerations summarized 
in Equation (2.4) indicate that this is a major factor in deter- 
mining the R-value of a given chromatography column. To 
understand why some chromatography experiments achieve 
separation while others do not, it is necessary to understand 
the physical basis of band broadening. The applied sample 
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may interact with the stationary phase, resulting in diffusion 
of the sample or it may become involved in mass transfer 
phenomena within the mobile or stationary phase. These are 
represented diagrammatically in Figure 2.5. 

Eddy diffusion arises from a tendency of the mobile phase 
to experience eddies or local regions of circular flow similar 
to that observed in whirlpools. Mobile phase passes through 
channels in the stationary phase (usually composed of small 
particles) that will vary somewhat in size, some being wide 
and some narrow. Circular flow in narrow channels is gen- 
erally slower than that in wide channels. Since sample will 
diffuse through a variety of channel sizes, it will be grad- 
ually distributed between fast-flowing, wide channels and 
slow-flowing, narrow channels. As a band of sample moves 
through the stationary phase it is broadened as a result of 
this diffusion. 

Mass transfer phenomena arise from distribution of sam- 
ple molecules within either the mobile or stationary phases. 
Mobile phase mass transfer occurs because mobile phase 
adjacent to the particle moves more slowly than mobile 
phase in the middle of the channel. This is due to frictional 
resistance to flow of the mobile phase by the particle. 

Stagnant mobile phase mass transfer arises because dif- 
ferent sample components enter channels that are not ori- 
entated in the same direction as the main flow through the 
column. These molecules are retained for varying amounts 
of time in the stagnant mobile phase in these channels. 
Therefore, the sample will be retained to varying extents. It 
is also possible for mass transfer to occur within the station- 
ary phase in stationary phase mass transfer (i.e. for sample 
to enter the surface of the stationary phase). 


2.2.4 Plate Height Equation 


One way to envisage the chromatographic separation of a 
sample is mentally to divide the column into very thin, hor- 
izontal sections that we call theoretical plates. If we take 
a simple example of a two-component mixture in which 
the components have partition coefficients of 1 and 0.1, re- 
spectively, and follow how it behaves as it moves through 
a small number of succeeding plates we can easily see how 
separation depends on the difference in component partition 
coefficients (Example 2.1). The theoretical plate number, 
N, is an important performance indicator of a chromatogra- 
phy column and is directly related to the surface area of the 
particles of which the stationary phase is composed. It may 
be calculated from the relationship between a chromatogra- 
phy peak’s retention time, fg and width at its base, W, using 
Equation (2.5): 


ÍR 2 
N = 16. (5) (2.5) 


16 PHYSICAL BIOCHEMISTRY 


Sampe > HHH 44444 EHF pS 


Il uu | iv | 


| 


Narrow 
channel 


Broad 
channel 


© sampe => |] 14} } 444444444 pi 


bandwidth 


Stagnant mobil 


phas € eee! 
| eS t 


Broadened 
band 


particles 


| g Stationary phase 


@) sampe > PELL ddd d ddd 


Sample —> PIII H = fini 


Slower-moving 
mobile phase 


TN 


Broadened 
band 


| 


Faster-moving 
mobile phase 


} Initial 
bandwidth 


Broadened 
band 


Sample can enter 
stationary phase 


Figure 2.5. Physical causes of band broadening. (a) Eddy diffusion, (b) Mobile phase mass transfer, (c) Stagnant mobile phase mass transfer, 
(d) Stationary phase mass transfer. All of these may simultaneously contribute to broadening of the comparatively narrow initial bandwidth 


of applied sample. 


The higher the value of N, the better the chromatographic 
performance we can expect to obtain from a column. This 
number may be compared between different columns and 
may even be used to compare bioseparations achieved in 
other types of experimental systems (e.g. electrophoretic 
separations; see Section 5.9.1). In practice, it may be difficult 
to measure W accurately so an alternative relationship may 
be used in which the peak width at half the peak height, Wn, 
is used (Equation (2.6)): 


tR 2 
N = 5.54 (=) (2.6) 


Two chromatography columns of different length (e.g. 
15 and 30cm) might contain the same number of theoret- 


ical plates (e.g. a typical 15 x 0.46cm C-8 reverse phase 
column contains approx. 13 000 theoretical plates). Clearly, 
the shorter column would be regarded as giving better chro- 
matographic performance. We can express the number of 
theoretical plates as a function of column length with the 
plate height number, H , also sometimes known as the height 
equivalent to a theoretical plate, HETP (Equation (2.7)): 


Ha (2.7) 


where L is column length. The better the chromatographic 
performance of a column, the smaller this number should 
be. In our example of columns of 15 and 30 cm length, the H 
value of the former (1.15 um) will be half that of the latter 
(2.3 um). 
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Example 2.1 Theoretical plates in chromatography 


Theoretical plate is a term borrowed from the theory of distillation in which fractionation of samples between different 
phases occurs. In column chromatography, a theoretical plate may be imagined as a very thin section within which 
molecules can partition between mobile and stationary phases. The thinner this section, the better the chromatographic 
performance of the system. The concept of theoretical plates is useful in understanding how the partitioning behaviour 
of sample components determines their chromatographic separation. 
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A sample made up of 1000 molecules of two types of components (500 of each) is applied to a chromatography system 
consisting of a mobile and stationary phase. These differ from each other in the value of their partition coefficient, k, 
which is the ratio of the concentration of the molecule in the stationary phase to that in the mobile phase. In this example, 
the values for k are 1.0 (white bars) and 0.1 (grey bars), respectively. This indicates that the molecule represented by the 
white bars partitions 1 : 1 between the phases while that represented by the grey bars partitions 9: 1. Panels a) to f) show 
expected partitioning of the two components between the stationary and mobile phases as they progress through the first 
six theoretical plates. 

(a) In the first plate the molecules with k = 1.0 partition evenly between the two phases (i.e. 250 in each) while the 
molecules with k = 0.1 partition very differently (i.e. 450 in stationary phase and 50 in mobile phase). The molecules in 
the mobile phase pass freely to the next theoretical plate (arrow). 

(b) In this plate, they again partition 1 : 1 and 9: 1 according to their values of k. Molecules left behind in the stationary 
phase of the first plate also partition in these ratios (i.e. 250 —> 125: 125, etc.). As molecules move through succeeding 
plates, those in the mobile phase constantly redistribute themselves according to their partition coefficients as do those 
left behind in stationary phase of earlier plates. 

(g) Shows the profile of molecules in mobile phase as they might elute from the system (ignoring band broadening 
effects described in Section 2.2.3). The molecules represented by the grey bars are strongly retained by the stationary phase 
while those represented by the white bars are much less strongly retained. However, the components have clearly begun 
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to separate as a result of passage through only six theoretical plates. It should be noted that a real chromatography system 
would contain many thousands of theoretical plates. This example shows how differences in k underlie chromatographic 
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Since W is the denominator in Equation (2.5), there is a 
close relationship between the number of theoretical plates 
in a chromatography column and the extent of band broaden- 
ing experienced by sample as it passes through the column. 
A column containing a large number of theoretical plates 
would be expected to cause little band broadening to sample 
while the opposite is true for columns containing a small 
number of theoretical plates. 

We can write an overall plate height equation for a chro- 
matographic system as follows (Equation (2.8)): 


H = Hı + H + H; + H4 (2.8) 


where H;—H; are individual plate height values for the main 
causes of band broadening described in the previous sec- 
tion (H;, eddy diffusion; H) mobile phase mass transfer, 


etc.). Since in liquid chromatography Hy, (the contribution 
of stationary phase mass transfer) is small, we can gener- 
ally ignore it. For such a system the overall plate height is, 
therefore, given by Equation (2.9): 


H C.-d mde -V Com p 2.9 
=O t (2.9) 


where Ce, Cm, Csm and C, are coefficients for the particular 
class of particle used as stationary phase, d, is particle di- 
ameter, dr is the thickness of stationary phase film, v is the 
flow rate of mobile phase, Dm and D, represent the diffusion 
coefficients of sample in the mobile and stationary phases, 


respectively. 


We can simplify this relationship as follows (Equation 
(2.10)): 


Cet Cn: dp: v0 Com: dy-v 
4 =4,( D P 4 D. 


) (2.10) 


The principal factors leading to extensive peak broaden- 
ing, and hence high values for H, are therefore; the class of 
stationary phase selected; the particle diameter dp; the flow 
rate v; the molecular size of the sample molecules (which 
will affect Dm). The chromatographer can achieve lower 
values of H by judicious choice of stationary phase particle 
and, especially, by using particles with small diameters. This 
is the physical basis for high resolution in systems such as 
HPLC (Section 2.6) and FPLC (Section 2.7). It should also 
be noted that separations can also be potentially improved 
by using slower flow rates and elevated temperature (which 
tends to increase Dm). 

A plot of H versus the flow rate of the mobile phase, v, 
is called the van Deemter curve (Figure 2.6). This may be 
used to fit experimental data to allow comparison of differ- 
ent stationary phases and to determine the optimum flow 
rate of the mobile phase giving lowest values of H in a par- 
ticular chromatography system. It is based on an empirical 
relationship between H and three factors (Equation (2.11)): 


B 
H=A+—4+C-v (2.11) 
v 


where A represents eddy diffusion, B is longitudinal diffu- 
sion (i.e. diffusion along the direction of flow) and C de- 


Optimum 
flow rate 


(um) 


Flow rate, v (cm/h) 


Figure 2.6. The van Deemter curve. A plot of H versus flow rate, v, 
is shown (solid line). Variation of the three components of Equation 
(2.16) with v is shown individually. 
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scribes mass transfer effects. Slower flow rates give better 
chromatography and this is reflected in the partial depen- 
dence of H on the product of C and v (i.e. a decrease in 
C - v gives a smaller H). At extremely slow flow rates, H 
increases, as a result of the partial dependence on the ratio 
B/v. A very small v is the denominator of the B/v term that 
would therefore tend to be very large in this region of the 
plot, giving a large value for H. The point where these two 
tendencies balance represents the optimum value for v. In 
practice, this value is usually found to occur at prohibitively 
low flow rates that would result in extremely long analysis 
times. For this reason, flow rates higher than the optimum 
value are used allowing shorter analysis times, albeit at the 
expense of loss of separation efficiency. 


2.2.5 Capacity Factor 


When there are several components in a sample separated by 
chromatography as described in Figure 2.4, we can write, for 
each component, an equation for retention volume (Equation 
(2.12)): 

Vi = Vo (1 + kı) (2.12) 
where Vo is the void volume defined in Section 2.2.1 and 
kı is the capacity factor for component 1. This capacity 
factor is partly dependent on partition coefficient and can 
easily be calculated from chromatograms using Equation 
(2.12). Since separation of component 1 from component 2 
is determined largely by their different partition coefficients 
as shown in Example 2.1 we can further define a separation 
factor, a, for this separation (Equation (2.13)): 


_ k 


æ= — 
k2 


(2.13) 


This is also a performance indicator since high separa- 
tion factor values denote good chromatography while low 
separation factors are associated with poor separation. 


2.2.6 Peak Symmetry 


Many chromatographic separations achieve baseline sepa- 
ration; that is no overlap between individual peaks in the 
chromatogram. Moreover, it is often assumed that the peaks 
will possess a Gaussian shape. This is the ideal situation 
and may allow us to calculate N directly from the peak 
(Equations (2.5) and (2.6)). In practice, peak symmetry may 
be significantly distorted from Gaussian. This arises from 
a variety of factors such as the use of nonlinear flow-rates, 
incomplete separation of peaks (e.g. the presence of ‘shoul- 
ders’ on peaks, an important clue to peak contamination) 
and the use of gradient elution techniques all of which are 
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frequently encountered in practical chromatography. Peak 
symmetry can, therefore, give us valuable clues as to the 
performance of chromatography systems. 


2.2.7 Significance of Performance Criteria 
in Chromatography 


Performance criteria may be used for the comparison of 
otherwise very different chromatography systems. They are 
independent of the chemical nature of the adsorption of sam- 
ple to the stationary phase (Section 2.4) because they depend 
primarily on physical interactions between sample compo- 
nents and the stationary phase. It is possible, therefore, to 
use these criteria meaningfully to compare, for example, re- 
verse phase HPLC separations to open column ion-exchange 
chromatography with DEAE-cellulose. This comparison is 
also independent of the nature of the sample components, 
being equally applicable to peptides and sterol derivatives, 
for example. For ‘good’ chromatography we would require 
high resolution (R), a large number of theoretical plates (N) 
in the column, a low plate height number (H) and high 
values of the separation factor (œ). Poor chromatography 
may be readily recognized by these criteria and improved 
purification quantified by their use. 


2.3 CHROMATOGRAPHY EQUIPMENT 


2.3.1 Outline of Standard System Used 


A wide variety of experimental systems may be used to 
achieve chromatographic separation. The degree of com- 
plexity of the system depends on the degree of automation 
used. However, the general arrangement is as shown in Fig- 
ure 2.2. The simplest, manual chromatography set-up for 
open column chromatography (Section 2.5) may be assem- 
bled in the laboratory using plastic bottles as reservoirs and 
glass columns which may be blown onto sintered glass fun- 
nels. Detection in such a system may be achieved by making 
manual absorbance measurements after separation. In sucha 
system, flow may be driven simply by gravity or else by use 
of a peristaltic pump. Inexpensive glass and plastic columns 
are widely available from commercial suppliers. 

The principal advantage of automation is the possibility 
of achieving more reproducible separation. A modest degree 
of automation is possible by the use of peristaltic pumps, 
on-line detectors and fraction collectors. However, the in- 
troduction of specialized microprocessor-controlled chro- 
matography workstations has made possible the sequential 
separation of multiple samples, fine-tuning of chromatog- 
raphy programs to optimize resolution, automatic peak col- 
lection and calculation of peak area and symmetry as well 
as overlaying of chromatograms for comparison. 


2.3.2 Components of Chromatography System 


Buffers and other solutions required for preparation of mo- 
bile phase should be filtered carefully before use to re- 
move particulate matter and other contaminants. For high- 
resolution chromatography, they should also be degassed 
although automatic sparging with oxygen-free nitrogen or 
helium gas is available in some systems. Mobile phase is 
pumped from a reservoir onto the column. Where more than 
one reservoir contributes to the mobile phase (e.g. in bi- 
nary, ternary and quaternary gradients), flow from these 
reservoirs is mixed together in a mixer before arrival at the 
column. In open column chromatography (see Section 2.5), 
flow may be achieved by gravity or the use of peristaltic 
pumps. In FPLC (see Section 2.7) and HPLC (Section 2.6) 
high-precision pumps drive mobile phase through the sys- 
tem. Separation takes place in a column that contains the 
stationary phase. This may be constructed from glass, plas- 
tic or steel depending on the pressure of the system. In 
HPLC, a further column called a guard column may pre- 
cede the chromatography column in the flow. The function 
of this is to protect the latter column from clogging and to 
prolong its useful life. 

On-line detection may be by absorbance, fluorescence, re- 
fractive index, electrochemical or some other physicochemi- 
cal measurement. Absorbance measurements at 280 nm will 
detect most proteins although, to be sure not to miss proteins 
or peptides poor in aromatic residues, detection in the range 
205-220 nm (which detects peptide bonds which have a Amax 
at 214nm) is often preferable. Fluorescence and refractive 
index detection allow greater sensitivity and are especially 
suitable for analytical chromatography. A plot of detector 
signal versus the retention of sample components is called an 
elution profile. Separated peaks may be collected by means 
of a fraction collector. 

Before chromatography, it is advisable to filter or cen- 
trifuge sample to remove particulates that could potentially 
clog the column. Sample introduction is by direct flow onto 
the column (in open column chromatography) or by injec- 
tion through a sample injection port into the mobile phase 
stream in high-resolution systems. For multiple samples an 
autosampler is used and this is especially useful in analyti- 
cal separations. Since open column separations are generally 
quite slow, it is often desirable to carry them out at lower tem- 
peratures such as in a cold-room or a refrigerated cabinet to 
minimize loss of biological activity. Higher resolution chro- 
matography may usually be performed at room temperature. 


2.3.3 Stationary Phases Used 


As mentioned in Section 2.1.2, hydrated polymers are most 
commonly used in chromatography of biochemical sam- 
ples. A wide range of these and other packings are now 
commercially available and a selection of these, together 


Table 2.1. Some stationary phases used in chromatography 


Packing Composition 
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Application 


DEAE/CM.-cellulose Polysaccharide (cellulose) 


Glutathione-agarose Polysaccharide (agarose) 


IDA-agarose 
Sephacryl S-300 


Polysaccharide (agarose) 
Polysaccharide (dextran/ 

bis acrylamide) 
Polysaccharide (dextran) 
Polysaccharide (agarose) 
Silica 
Poly(styrene-divinylbenzene) 
Polysaccharide (cellulose) 


Sephadex G-25 
Superose 12 

C-18 Silica 

POROS 
DEAE/CM-MemSep 


with their applications, is shown in Table 2.1. The station- 
ary phase is required to be mechanically stable (especially to 
high pressures and liquid shear forces) and chemically inert 
under the chromatography conditions used. Often, it is also 
desirable that it is capable of acting as a support for chemical 
groups possessing desirable properties for chromatography 
such as hydrophobicity, ion exchange or affinity ligands 
(Section 2.4). 


a) Argo 


(c) Gradient 
begins 


Ion exchange chromatography, especially early in 
purification scheme 

Affinity chromatography and purification of GST fusion 
proteins 

IMAC (see Section 2.4.6) 

Gel filtration of proteins in Mass range 10-1500 kDa 


Desalting of protein extracts by gel filtration 

Gel filtration FPLC in the Mass range 1-300 kDa 
Reversed-phase HPLC of tryptic peptides 

Perfusion chromatography 

Membrane-based ion exchange chromatography of proteins 


2.3.4 Elution 


Elution from or development of chromatography columns 
may be achieved in three possible ways, depending on 
the specific nature of chemical interaction between sample 
components and stationary phase. These are illustrated in 
Figure 2.7. Continuous flow elution or isocratic elution is 
where mobile phase flowing through the column is continu- 
ous both in flow rate and in composition. This is especially 


(b) 


A280 


Mobile 
phase 1 


Retention 
(ml) 


Figure 2.7. Elution from stationary phase. Mobile phase is shown as dashed lines. This may vary in composition to give different pH, ion 
strength or hydrophobicity as described in text. (a) Continuous flow elution. Sample components separate in a situation where they possess 
different inherent affinities for the stationary phase and where the mobile phase composition and flow-rate remain constant. (b) Batch flow 
elution. Adsorbed sample components may be selectively eluted by washing column sequentially with a range of mobile phases. (c) Gradient 
elution. Adsorbed material is separated by the application of a gradient composed of two or more buffers in which mobile phase composition 
is continuously varied. More complex gradients (e.g. hyperbolic gradients) may be generated by continuously varying percent buffer B in 


the mobile phase. 
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applicable in partition chromatography systems. It is usu- 
ally found that leading (i.e. early eluting) peaks are sharp 
but poorly resolved with this type of elution while trailing 
(i.e. late eluting) peaks, by contrast, are well separated but 
tend to have undergone extensive peak broadening. This is 
known as the general elution problem. 

Where sample components have adsorbed to the column 
packing, they often have quite different, individual, affinities 
for the stationary phase. A description of the major types of 
adsorption chromatography used in biochemistry is given in 
Section 2.4 and it will be clear from this that these different 
affinities arise from inherent structural characteristics of the 
sample components. The chromatographer may use this to 
separate them from each other. Batch flow elution can be 
used in such a situation and involves the stepwise introduc- 
tion of different mobile phases in which the pH, polarity, salt 
concentration or some other component are varied. Condi- 
tions may be selected where one sample component remains 
bound to the stationary phase while another elutes, usually 
in a small volume. 

A variation on this is gradient elution, where pH, polar- 
ity, salt concentration are varied continuously rather than 
stepwise. This type of elution overcomes the general elution 
problem mentioned earlier, ensuring high-resolution sep- 
arations throughout the chromatogram within a relatively 
short range of elution. A gradient maker is required for 
this procedure. Automation allows the creation of complex 
gradients with a variety of shapes (Figure 2.7). Gradients 
are prepared by continuously mixing two different buffers 
(although ternary gradients, prepared from three different 
buffers are also possible) usually referred to as buffers ‘A’ 
and ‘B’, respectively. The resulting mixture acts as the mo- 
bile phase. Gradient elution would need to be performed 
to determine elution conditions for batch flow elution. This 
kind of practice is often used when scaling up chromatog- 
raphy for preparative separation such as in downstream 
processing of proteins (Section 2.5.2). In some cases, com- 
bining various gradient steps with isocratic elution may op- 
timize separation of poorly resolved peaks. 

Elution with ionic compounds such as NaCl and KCI in 
ion exchange chromatography (Section 2.4.1) may require 
high salt concentrations. In downstream processing of pro- 
teins this salt will need to be removed after chromatography 
which can be expensive and time consuming. An alterna- 
tive approach is stepwise elution with displacers, that is 
molecules with a higher affinity for the stationary phase 
than the target protein of interest. These molecules may be 
of low or of high molecular weight (e.g. heparin and starch 
derivatives, respectively) and offer the advantage that they 
induce elution of proteins in highly concentrated peaks of 
pure material at relatively low displacer molar concentra- 
tions (e.g. 40 mM). 

Because of the sensitivity of biomolecules such as pro- 
teins to their surrounding environment, it is possible to alter 


the charge, hydrophobicity and overall structure of their sur- 
face by varying pH, ion strength, and so on of the mobile 
phase. As a consequence of this, the elution profile obtained 
at a particular pH will vary if the pH of the mobile phase is al- 
tered. Since proteins are highly individual in this sensitivity 
to their environment, they will each be affected differently 
by changes in pH. Thus, by performing chromatography 
with a variety of stationary phases under a range of condi- 
tions of pH and ion strength, we can optimize conditions for 
separation. 


2.4 MODES OF CHROMATOGRAPHY 


So far, we have described chromatography systems mainly 
in terms of their resolution and performance criteria. This 
is useful because it helps us to understand why the physical 
characteristics (especially particle diameter) of the particles 
of which the stationary phase is composed are so impor- 
tant (Section 2.2). However, the chemical basis underlying 
interaction between sample components and the stationary 
phase provides a complementary description of the princi- 
pal chromatography systems used in biochemistry. We may 
define this interaction as the mode of chromatography. In the 
following sections we will look at relatively low resolution 
open column chromatography and the more high resolu- 
tion FPLC, HPLC and other systems (Sections 2.5-2.9). All 
of these systems, regardless of the very different physical 
characteristics of their stationary phases, are available in 
a variety of chromatography modes. We can illustrate this 
situation by the matrix shown in Figure 2.8. Any chromatog- 
raphy system may be located on this matrix in terms of only 
two parameters; chromatography mode and format (i.e. sta- 
tionary phase differing in physical characteristics such as 
particle diameter). This matrix is useful in that it may be 
easily extended to accommodate new formats and modes of 
chromatography. In this section, the main modes of chro- 
matography will be described. 


2.4.1 Ion Exchange 


Proteins have a net charge on their surface (Section 1.2.1) 
arising mainly from amino acid side chains some of which 
possess either a positive or negative charge at a given pH. 
Around pH 7.0 acidic amino acids (glutamic and aspartic 
acids) contribute negative charges while basic residues (ly- 
sine and arginine) contribute positive charges. At acidic pH, 
most proteins possess net positive charge while at alkaline 
pH they possess net negative charge. Positive and negative 
charges can cancel each other out so that, at a particular 
pH value, the protein may possess net zero charge even 
though it contains some positive and some negative charges. 
The pH at which a given protein possesses zero net charge 
is called the isoelectric point, pI (Figure 2.9). Since the 
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MODE 
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FORMAT 
lon Gel Reversed Hydrophobic Affinity IMAC 
exchange filtration phase interaction 
Open DEAE-cellulose} Sephadex G-25 Octyl-sepharose Glutathione IDA-agarose 
column CM-cellulose Phenyl sepharose -agarose 
NIIS-activated 
FPLC Superose-6 Cg-silica Phenyl-sepharose Saperaes- 2 IDA-Superose 
Superose-12 Cjg-silica Protein A/G- 
HPLC Fractogel TSK Cg-silane eo IDA-silica 
Poros-H Poros R Poros-PE 5- 
Perfusion Q Poros A Poros-IDA 
Poros-HS Poros R3 Poros-ET (protein A) 
Membrane ee Aerated IDA-cellulose 


-cellulose 


-cellulose 


Figure 2.8. Matrix illustrating the relationship between different chromatography systems in terms of chromatography mode and format. 


Selected commonly-used stationary phases are shown. 


surface charge, structure and pI are dependent on the amino 
acid sequence of the protein, they may be regarded as char- 
acteristic physical properties that may be used to separate 
proteins from each other. 

Ion exchange chromatography is a form of adsorp- 
tion chromatography in which charged proteins or other 
biomolecules are exchanged for small ions of like charge 
originating in salts (e.g. Nat, Cl~). These ions are attached 


pH 


10 11 12 


8 9 


Figure 2.9. Surface charge of proteins. The net charge on the sur- 
face of two proteins (pI values of 5.5 and 7.5, respectively) are 
shown at a range of pH values. Note how this varies with pH. 


to chemical structures on the surface of the stationary phase 
called ion exchange groups or ion exchangers. Two types of 
ion exchanger are common in biochemistry; Anion exchang- 
ers (which exchange negatively charged ions or anions) and 
cation exchangers (which exchange positively charged ions 
or cations; Figure 2.10). It is important to note that this ex- 
change process is independent of mass, shape or other phys- 
ical characteristics of the molecule to be separated. Proteins 
that are otherwise very different might, coincidentally, dis- 
play the same behaviour on ion exchange chromatography: 
Conversely, isoenzymes (which may be very similar in terms 
of primary structure and biological activity) might behave 
very differently. 

Ion exchangers are attached to a wide variety of solid 
supports such as cellulose, agarose and vinylbenzene. A 
selection of these is shown in Table 2.2. The physical prop- 
erties of the stationary phase as a whole (i.e. ion exchanger 
plus support) will determine whether high or low resolution 
is achievable with a given experimental system as previ- 
ously described in Section 2.2. Some ion exchangers are 
regarded as weak, that is functioning best over a compara- 
tively narrow pH range, while others are strong, that is func- 
tioning over a wider pH range. Exchangers with quaternary 
amino or sulfonic acid groups behave as strong anion and 
cation exchangers, respectively, while aromatic/aliphatic 
amino and carboxylic acid groups are weak anion and cation 
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DEAE-cellulose 


Ccoo- 
CM-cellulose 
Figure 2.10. Ion exchangers. The structure of (a) diethy- 


laminoethyl (DEAE)-cellulose, an anion exchanger and (b) car- 
boxymethyl (CM)-cellulose, a cation exchanger. 


exchangers, respectively. Choice of ion exchange stationary 
phase is heavily influenced by knowledge of the pI of the 
protein of interest. 

In an ion exchange chromatography experiment (Figure 
2.11), sample is applied to a stationary phase which has been 


Table 2.2. Commonly-used ion exchange groups 


Group 


pH range 


charged with the ion to be exchanged; the counterion (e.g. 
Cl-). The protein (and any other species of like charge in the 
sample) may exchange with this counterion, binding to the 
ion exchanger. It is important that the sample is free of com- 
ponents of like charge since these are usually present in large 
molar excess relative to the concentration of protein. The ion 
exchanger will not distinguish between proteins possessing 
a single positive charge and, for example, a Na* ion also 
present in the sample. Desalting is carried out before ion 
exchange either by gel filtration chromatography (Section 
2.4.2), by dialysis (Section 7.6.1) or by centrifugal filtration 
(Section 7.6.2). Elution of bound proteins is achieved by 
reversing the process of binding and, again, exchanging a 
counterion for protein. This is usually carried out by apply- 
ing a large excess of a salt (e.g. NaCl) containing the coun- 
terion in the mobile phase. Because proteins have different 
net charge, they may bind to an ion exchanger at a given pH 
with a variety of strengths, that is some proteins may bind 
strongly whilst others bind weakly or not at all. We can take 
advantage of this to separate proteins by applying salt in a 
continuous gradient. Weakly bound proteins will elute first 
from sucha system while strongly bound proteins elute later. 

Proteins possess some charge at most pH values (their 
net charge is zero only at the pH corresponding to their 
pl). By deliberately varying pH we may selectively alter 
net charge on their surface. Even though two proteins may 
behave identically at a given pH, they are unlikely to do so 
at all pH values. By carrying out separations across a range 
of pH, therefore, we can resolve proteins that might cochro- 
matograph at a single pH. In exploring chromatographic 
behaviour across a range of pH values it should be borne in 
mind that many proteins are structurally unstable at extremes 


Chemical structure 


Cation exchangers 


Sulphopropyl (SP) 2-12 —(CH>).—CH»—SO, 
Methyl] sulphonate (S) 2-12 —CH2—SO, 
Carboxymethyl (CM) 6-11 —O—CH,—COO- 
Anion exchangers 
Quaternary ammonium (Q) 2-12 CH3 
=CH 7N em CH, 
CH3 
Diethylaminoethyl (DEAE) 2-9 Os 
la o o CH)- CH2—- NH* 
C2H5 
Quaternary aminoethyl (QAE) 2-12 „C2H5 
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Figure 2.11. A typical ion exchange experiment. (a) Cl~ is at- 
tached to positively-charged ion exchanger. (b) Negatively-charged 
protein exchanges with CI~ and binds to the ion exchanger. (c) 
A competing Cl~ (from a NaCl gradient) elutes the protein. Note 
that positively-charged proteins pass straight through the stationary 
phase and that proteins with different net charge are separated by 
the gradient of NaCl (Figure 2.7). 


of pH. This may cause fouling of the stationary phase bed 
and also result in loss of biological activity of the protein to 
be purified. Preliminary experiments to determine pH stabil- 
ity and solubility of the protein to be purified are, therefore, 
best carried out before ion exchange chromatography. 


Gel filtration 


Direction of bead 
flow 


Mixture of proteins 


Pores 
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As well as achieving separation of fully active proteins 
under native conditions with ion exchange chromatography, 
it is also possible to carry out the experiment under dena- 
turing conditions. For example, peptides from a chemical 
or enzymatic digest of a protein may be separated in the 
presence of 4—7M urea. Urea is a chaotropic agent that 
denatures proteins by decreasing hydrophobic interactions 
within polypeptides and by disrupting ionic interactions (es- 
pecially hydrogen bonding) within and between protein sub- 
units or individual peptides in a digest. When ion exchange 
is performed in the presence of urea, subunits and peptides 
will no longer be bound together by interactions such as salt 
bridges and hydrogen bonds. Although a polar molecule, 
urea possesses zero net charge so it will not itself interact 
with the ion exchanger. 


2.4.2 Gel Filtration 


Proteins and other biomolecules often differ greatly from 
each other in their mass and shape. Gel filtration chromatog- 
raphy (also known as molecular sieve or size exclusion chro- 
matography) takes advantage of this difference by retaining 
biomolecules of a given size range and fractionating them 
in a manner related to their mass in a form of partition chro- 
matography. This is possible because the stationary phase 
employed in this method is made up of a gel consisting of 
beads containing pores of a defined and narrow size range. 
These will allow only biomolecules below a particular mass 
to diffuse into the pore. This is referred to as the cutoff value 
or exclusion limit (Figure 2.12) of the stationary phase. A 
variety of different stationary phases, each with its own 
particular cutoff value is commercially available and a se- 
lection of these is summarized in Table 2.3. Once inside 
the bead, the biomolecule will be retained momentarily in 


Proteins exceeding 
exclusion limit 
cannot enter bead 


Small proteins enter 
bead through pores 
and are retained 


Figure 2.12. Gel filtration chromatography of proteins. Note that only proteins smaller than the bead exclusion limit may enter the bead. 
Larger proteins are excluded. Therefore, proteins elute in inverse order of native mass. 
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Table 2.3. Gel filtration media. A selection of commonly-used gel filtration media with their individual fractionation 
range and composition 


Fractionation range kDa 
Resin (globular proteins) Application 


SEPHADEX-(dextran) 


G-10 <0.7 Desalting 

G-25 1-5 Desalting 

G-50 1.5-30 Peptide separation 

G-75 3-80 Protein fractionation 

G-100 4-150 Protein fractionation 

G-150 5-300 Protein fractionation 

G-200 5-600 Protein fractionation 

SEPHAROSE-(agarose) 

6B 10-4000 Protein fractionation 

4B 60-20 000 Protein fractionation/ligand immobilisation 

2B 70—40 000 Fractionation of nucleic acids, particles and 
viruses 

SEPHACRYL (dextran crosslinked with V,N’-methylene bisacrylanlide) 

S-100 HR 1-100 Peptide/protein fraclionation 

S-200 HR 5-250 Protein fraclionation 

S-300 HR 10-1500 Protein fraclionation 

S-400 HR 20-8000 Protein fractionation of molecules with 
extended structures 

S-500 HR 80-80 000 Large molecules and small particles 

BIOGEL (polyacrylamide) 

P-2 0.1-1.8 Desalting 

P-4 0.8—4 Desalting 

P-6 1-6 Peptide separation 

P-10 1.5-20 Fraetionation of small proteins 

P-30 2.540 Protein fractionation 

P-60 3-60 Protein fractionation 

P-100 5-100 Protein fraciionation 

P-150 15-150 Protein fraciionaiion 

P-200 30-200 Protein fractionation 

P-300 60—400 Fractionation of extended structures 

SUPERDEX-(dexlran covalently bonded to highly crosslinked agarose) 

75 3-70 FPLC gel nitration 

200 10-600 FPLC gel filtration 

SUPEROSE- 

(cross-linked 
agarose) 
12 1-300 FPLC gel filtration 
6 5-5000 FPLC gel filtration 
FRACTOGEL-(polyvinyl chloride) 

TSK HW-40 0.1-10 HPLC gel filtration (peptides) 

TSK HW-55 1-700 HPLC gel filtration (proteins) 

TSK G2000SW 5-100 HPLC gel filtration (proteins) 

TSK G3000SW 10-500 HPLC gel fillraiion (proteins) 

TSK HW-65 50-5000 HPLC gel filtration (proteins) 


TSK HW-75 500-50 000 HPLC gel filtration (complexes) 
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Example 2.2 Determination of native mass of a protein by gel filtration 


Gel filtration beads covering a particular molecular mass range are selected and packed into a column (Table 2.3). Unlike 
other chromatography formats column geometry is important in this experiment. Gel filtration columns are usually long 
relative to their column diameter. However, the low tolerance of gel filtration resins to hydrostatic pressure places an 
upper limit on the volume of resin which may be used (i.e. the column cannot be too long). Conversely, the loading 
volume of sample is also limited (usually 1-2% of the total column volume) which means that the column volume cannot 
be too small either. Once packed, the void volume of the column is determined by passing a solution of blue dextran 
through it and measuring its elution volume. Standard proteins of known native M, are then applied to the column to 
calibrate it. These may be applied singly or in pairs (provided the two proteins are well-separated on the column and can 
clearly be distinguished by some other technique such as spectroscopy or electrophoresis). In selecting standards it is 
important that they are comparable to the sample protein under investigation (i.e. that they have the same general shape 
characteristics). 
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The elution volumes (retention) of sample and standard proteins were determined on Superose 12. There is a linear 
relationship between log molecular mass and elution volume and this relationship allows creation of a standard curve 
for the column. From this plot, native mass of the sample protein may be estimated as 100 kDa. Note that the plot is 
slightly curved at the ends, suggesting a deviation from linearity. Comparisons should only be made in the linear part of 
the plot and, if necessary, the experiment should be repeated using a resin with a slightly different fractionation range. 
Electrophoretic analysis (Section 5.3.1) indicated that this protein had a subunit mass half that of its native mass (i.e. 
50 kDa) suggesting a dimeric quaternary structure. 


channels passing through it. As the sample progresses 
through a bed of such stationary phase, smaller molecules 
will be retained more readily than larger ones. Molecules of 
greater mass than the exclusion limit will pass freely through 
the column, eluting in a single, unresolved, peak at the void 
volume. Molecules within the fractionation range of the gel, 
however, will be retained in a manner generally inversely 
proportional to their mass, that is smaller molecules will 
be retained longest. The fractionation range and resolution 
achievable with such a chromatography system will depend 
greatly on the composition and uniformity of the population 
of beads in the stationary phase. 

Gel filtration chromatography has three main applications 
in biochemistry. It may be used in a routine way to remove 
low molecular weight components from samples. For ex- 
ample, as pointed out in the previous section, it is possible 
to desalt protein solutions by passage through a column of 


Sephadex G-25 (fractionation range 1-5 kDa). This appli- 
cation takes advantage of the cutoff value of these beads 
5kDa) rather than of their fractionation range. Such chro- 
matography is also useful for rapid exchange into a buffer 
of different composition with minimum dilution of protein 
sample. 

A second application depends on the fact that the elution 
volume of proteins from gel filtration columns depends on 
their mass. Thus, if we determine the elution volumes of pro- 
teins of known mass on a given gel filtration column, we can 
easily determine the mass of a protein of unknown mass by 
plotting log M, versus elution volume (Example 2.2). This 
mass is the native mass of the protein. It is important to note 
that this experiment makes a number of important assump- 
tions. Firstly it assumes that the protein has similar shape 
to the standards used (i.e. proteins of known mass). This is 
acceptable if both the protein and standards are globular, for 
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example. However, molecules with large axial ratios (i.e. 
rod-like in shape rather than globular; e.g. collagen) behave 
as if they have a much greater mass on gel filtration. Where 
little is known about the overall shape of the protein, care 
should be taken in interpreting gel filtration data. A further 
point to note is that some proteins may experience secondary 
ionic or ion exchange interactions with the stationary phase 
beads. This can result in the molecule being retained on 
gel filtration in a way that may not depend on mass alone. 
One means of avoiding such effects is to carry out the ex- 
periment in the presence of 100-300 mM NaCl or a similar 
salt. 

The mass of individual polypeptides or subunits of which 
oligomeric proteins are composed may be determined under 
denaturing conditions by a number of techniques such as 
SDS polyacrylamide gel electrophoresis (Chapter 5) or mass 
spectrometry (Chapter 4). Determination of native mass may 
help us to determine the quaternary structure of proteins by 
comparison with results from denaturing experiments. 

A third application of gel filtration chromatography is in 
protein purification. Since the method depends on molecular 
mass, it may be used to separate proteins based on differ- 
ences in their mass. An important limitation to this approach 
is the loading volume that can be achieved. Generally, this 
is a maximum of approximately 5% of the bed volume of 
the column. This technique is therefore often used near the 
end of a purification strategy as a polishing step. Neither gel 
filtration nor ion exchange chromatography alone is capa- 
ble of achieving complete purification of proteins. However, 
since it is unlikely that two proteins will have the same net 
charge and mass, combining the two approaches can often 
achieve impressive purification (Section 2.10.1). 


2.4.3 Reversed Phase 


As well as possessing characteristic mass, shape and sur- 
face charge, biomolecules may also differ from each other 
in terms of their net hydrophobicity. Even soluble proteins 
will often possess regions of their structure that are rather 
hydrophobic. These regions will depend on the primary 
structure of the protein and may be used to separate oth- 
erwise very similar proteins or peptides. If we immobilize 
hydrophobic groups such as hydrocarbon chains on a sta- 
tionary phase, sample components may bind to the chains 
as a result of these hydrophobic parts of their structure. 
This is the basis of reversed phase chromatography. In this 
type of partition chromatography, we designate the station- 
ary phase by the length of hydrocarbon chain used (e.g. 
C-4, C-8, C-18). In general, shorter length chains are used 
for chromatography of proteins while longer ones are more 
appropriate for peptides because of their greater relative hy- 
drophobicity. The conditions required for chromatography 
of proteins, in particular, frequently leads to their denatura- 
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Figure 2.13. Reversed phase chromatography. (a) Samples inter- 
act with nonpolar stationary phase by hydrophobic interaction and 
are eluted by increase in hydrophobicity of mobile phase. The three 
sample components shown (1 to 3) have increasing hydrophobic- 
ity. (b) Elution profile obtained with sample components 1 to 3%. 
Acetonitrile is shown by dashed lines. 


tion. Accordingly, reversed phase chromatography has many 
applications in analysis of proteins but is not very useful for 
the purification of biologically active proteins. 

In this type of chromatography (Figure 2.13) sample is 
applied in a polar solvent such as water, methanol, acetoni- 
trile or tetrahydrofuran or mixtures of these. Sample com- 
ponents attach to the hydrophobic chains by hydrophobic 
interactions. Sometimes, the sample components separate 
isocratically. More often, however, the column is devel- 
oped with a gradient of increasing solvent (e.g. acetoni- 
trile) and decreasing water (thus giving a gradual increase 
in mobile phase hydrophobicity). When the mobile phase 
becomes sufficiently hydrophobic, individual sample com- 
ponents elute from the stationary phase. Polar sample com- 
ponents elute first from this system (i.e. they show weakest 
hydrophobic interactions with hydrocarbon chains) while 
nonpolar components elute later (i.e. they show strongest 
hydrophobic interactions with hydrocarbon chains). Chro- 
matography is frequently carried out in the presence of ion 
pair reagents. These contain a counterion (i.e. an ion of op- 
posite charge to the sample component to be separated). In 


C-18 reversed phase chromatography of peptides, for exam- 
ple, 0.1% trifluoroacetic acid (TFA) is included in the mobile 
phase. The two oppositely charged species (i.e. peptide plus 
TFA) form an ion pair that has sufficient hydrophobic char- 
acter to be retained by the stationary phase. Inclusion of ion 
pair reagents strongly affects the chromatography of ionic 
species in the sample but has no effect on the behaviour of 
nonionic sample components. 

To a greater extent than in most other chromatography 
systems, retention of sample components in reversed phase 
chromatography is highly sensitive to the composition of 
the mobile phase. Comparatively small changes in polarity, 
temperature, pH and the presence of ion pair reagents in the 
mobile phase can exert profound effects on the chromatog- 
raphy achieved. This allows extremely high-resolution sep- 
arations of otherwise quite similar sample components (Ex- 
ample 2.3). 

An interesting approach to reversed phase chromatogra- 
phy involves the use of commercially available cartridges 
filled with stationary phase. These are relatively cheap and 
disposable, may be used with a syringe thus obviating the 
need for acolumn, pump, and so on and have a large capacity. 
Moreover, many of these cartridges may be placed in series, 
thus allowing loadings to be increased indefinitely. Since 
the reversed phase in these cartridges is identical to that 
used in reversed phase HPLC (Sections 2.6.2 and 2.10.2), it 
is possible to reproduce closely separations that have been 
performed on an analytical column. A good example of this 
type of experiment is the quantitative purification of partic- 
ular peptides from proteolytic digests for further structural 
or functional analysis. 

A much less widely used, though related technique to 
reversed phase is normal phase chromatography (Figure 
2.14). In this system, the stationary phase is polar (e.g. alky- 
lamine bonded to silica) while the mobile phase is nonpolar 
(e.g. heptane) and molecules elute in inverse order compared 
to reversed phase chromatography ina gradient of increasing 
polarity (e.g. methanol). This technique is especially useful 
for samples with low water-solubility. Historically, this type 
of system (nonpolar mobile and polar stationary phases) 
was introduced first but was later largely superseded by a 
system reversing these phases (i.e. polar mobile and non- 
polar stationary phases), hence the term ‘reversed phase’ 
chromatography. 


2.4.4 Hydrophobic Interaction 


Soluble proteins require a solvation shell of water on their 
surface to maintain solubility in aqueous solution. This wa- 
ter masks hydrophobic groups that also exist on the protein 
surface. In the presence of reagents that are capable of bind- 
ing water from the solvation shell (e.g. (NH4)2SOx), it is 
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Figure 2.14. Normal phase chromatography. (a) Samples interact 
with a polar stationary phase in the presence of N-heptane and 
are eluted by increasingly polar mobile phase. (b) Elution profile 
obtained with sample components 1 to 3 identical to those in Figure 
2.13. Note that these elute in inverse order compared to reversed 
phase chromatography percent. Methanol is shown by dashed lines. 


possible to disrupt solvation and expose these hydropho- 
bic groups. Using a hydrophobic stationary phase such as 
provided by bonded octyl (strongly hydrophobic) or phenyl 
(more weakly hydrophobic) groups, itis possible to promote 
hydrophobic interactions between protein and such a bonded 
phase. This is the basis of the adsorption chromatography 
technique called hydrophobic interaction chromatography. 
Protein samples are applied to columns containing octyl or 
phenyl groups in the presence of a large concentration (ap- 
prox. 2M) of a highly polar solvent such as (NH4)2SOq. 
In such solvents, hydrophobic interactions are strongly pro- 
moted between proteins and the stationary phase. Apply- 
ing a decreasing gradient of solvent polarity, (e.g. 2-0 M 
(NH4)2SO4) gradually disrupts hydrophobic interactions, 
thus separating proteins (with different net hydrophobic- 
ity) from each other. Alternatively, elution may be achieved 
by the use of a hydrophobic displacer molecule such as a 
nonionic detergent (e.g. Triton X-100), an aliphatic amine 
(e.g. butylamine) or an aliphatic alcohol (e.g. butanol). Since 
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Example 2.3 Tryptic peptide mapping of glutathione transferase isoenzymes by reversed phase HPLC 


Many proteins such as globins and immunoglobulins are expressed in living systems as complex protein families. Within 
an individual family there is close similarity in primary structure and we are often interested in identifying the extent 
of this similarity. Some proteolytic enzymes cleave substrate proteins specifically at bonds involving particular amino 
acids, thus generating a population of peptides which is directly dependent on the amino acid sequence of the substrate 
protein. Experiments in which these peptides are separated from each other are called peptide mapping experiments. Even 
closely-related proteins within a family would be expected to produce peptide maps containing some common peptides 
and some peptides specific to each protein. 

An example of a complex protein family is the glutathione transferases (GSTs), a group of detoxification enzymes 
which catalyze conjugation reactions between foreign (usually electrophilic) chemicals (xenobiotics) and the intracellular 
tripeptide glutathione. Two isoenzymes were identified which showed close similarity in substrate specificity, tissue 
expression and electrophoretic behaviour. One of these was a homodimer of the 3-subunit (rGST M3-3) while the 
other was composed solely of the 4-subunit (rGST M4-4). Notwithstanding their similarities, these isoenzymes could 
be separated by ion exchange chromatography. Samples of each were purified, digested with trypsin (which cleaves 
specifically at peptide bonds containing lysine or arginine) and peptide maps resolved by reversed phase C-18 HPLC. 
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The maps for rGST M3-3 (a) and rGST M4-4 (b) from rat liver could easily be compared. The column dimensions 
were 0.39 x 30cm and the particle size was 10 um. Retention is represented from right to left (flow-rate; 1 ml/min) 
and detection was at 254 nm. Peptides unique to the 3-subunit are labeled ‘A’, those unique to the 4-subunit are labeled 
‘D’ and unlabelled peptides were common to both proteins. The mobile phase contained 0.05% TFA and comprised a 
0-60% acetonitrile gradient (diagonal solid line). Commencement of chromatography is marked with arrows. This type 
of comparison is also possible with other techniques such as mass spectrometry (Chapter 4) and electrophoresis (Chapter 
5). Reversed phase HPLC is especially useful for analysis of tryptic peptide maps since this protease generates a large 
number of small peptides differing widely from each other in hydrophobicity. Other cleavage treatments (e.g. CNBr 
which cleaves at methionine) can generate fewer and larger fragments. 


(NH4)2SOz, precipitation is frequently an early stage in pro- 
tein purification, this type of chromatography may be conve- 
niently used subsequent to this step in a protein purification 
strategy. Important experimental variations in this technique 
include; using stationary phases of differing hydrophobic- 
ities (e.g. phenyl or octyl), different buffer concentrations 
and composition, varying pH (lower pH favours binding) 
and temperature (higher temperature favours binding). 

There is considerable similarity between this technique 
and reversed phase chromatography. Both depend essen- 
tially on hydrophobic interactions between sample compo- 
nents and a bonded stationary phase and on solvent polarity. 
They differ from each other, however, in the precise experi- 
mental conditions used to bring these interactions about and 
this has consequences for their applications in biochemistry. 
In reversed phase chromatography, elution arises as a result 
of the nature of the mobile phase, that is generally quite 
hydrophobic. A consequence of this is that proteins are de- 
natured in this type of system. This means that reversed 
phase chromatography depends on the total hydrophobic- 
ity of the protein and is especially useful in analytical ap- 
plications since this is an inherent property of the protein 
determined by its primary structure. This technique is, how- 
ever, not appropriate where it is desired to retain a protein’s 
biological activity. In hydrophobic interaction chromatog- 
raphy, interactions depend mainly on gentle disruption of 
the solvation shell, that is hydrophobic interactions only oc- 
cur between groups on the surface of the protein and the 
stationary phase. These are determined largely by the ter- 
tiary structure of the protein and, again, will be different 
for each protein. This technique, therefore, retains native 
protein structure and function. It is consequently widely 
used in protein purification as a preparative technique 
(Example 2.4). 


2.4.5 Affinity 


Recognition of different chemical shapes and structures is 
a fundamental and widespread property of biomolecules. 
For example, an enzyme is capable of recognizing its sub- 
strate and distinguishing it from other molecules that may 
be chemically similar. This type of biospecific recognition 
is known as affinity and advantage may be taken of this in 
the purification of enzymes and other biomolecules in the 
adsorption chromatography technique called affinity chro- 
matography. The fundamental principle of this technique is 
immobilization of a small molecule or affinity ligand on a 
stationary phase and application of sample containing the 
biomolecule to be purified to this phase. Because of the 
highly specific nature of biological recognition phenomena, 
only the molecule to be purified will bind to the station- 
ary phase and other molecules will pass freely through the 
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Table 2.4. Examples of group-specific ligands used in affinity 
chromatography 


Ligand Molecule purified 
Substrates Enzymes 

Inhibitors Enzymes 

Cofactors Enzymes 

Avidin Biotin-containing enzymes 
Antibodies Antigen 

Antigen Immunoglobulins 
Proteins A and G Immunoglobulins 
Hormone Receptor/binding protein 
Glutathione GST fusion proteins 
Soyabean lectins Glycoproteins 

Oligo dT PolyA mRNA 

Oligo A PolyU RNAs 

Lysine rRNA 


chromatography system. Examples of ligand-molecule pairs 
applicable to this kind of experiment are given in Table 2.4. 
As aresult of its high degree of specificity, this type of chro- 
matography is regarded as one of the most effective means 
of purifying proteins, often in a single step. 

Unlike many of the other chromatography techniques so 
far described it is necessary to design a specific stationary 
phase for each protein to be purified which will bind this pro- 
tein and none other. This requires considerable thought and 
experimentation but, in view of the high purification factors 
(the ratio of specific activity of protein to be purified at each 
step of the purification to that of the crude extract) achiev- 
able with this technique, it is usually worthwhile. Because of 
the gentle conditions employed, the technique is especially 
appropriate where it is desired to purify a molecule retain- 
ing full biological activity. For example, enzyme molecules 
that may have become inactivated during preparation of cell 
extract will not normally bind to a resin containing an immo- 
bilized substrate or inhibitor. Only fully active enzyme will 
bind, thus ensuring specific selection of fully active protein 
from the extract. This will maximize specific activity at this 
step of the purification, in turn, maximizing the purification 
factor achieved. 

An important aspect of the technique is immobilization 
of the ligand. Immobilization techniques take advantage of 
functional groups on the ligand such as -NH2, -SH, -COOH 
and -OH. -OH groups on the stationary phase (usually 
agarose) require activation to render them chemically reac- 
tive. The affinity ligand is then immobilized on the activated 
resin. The main techniques used to achieve activation are 
shown in Figure 2.15 while methods for immobilization are 
shown in Figure 2.16. Epoxy- and CNBr-activated resins 
are commercially available. 
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Example 2.4 Hydrophobic interaction chromatography of snake venom Phospholipase A, 


Phospholipases are responsible for a wide range of pharmacological effects including haemolysis of red blood cells, 
hypertension and prevention of blood coagulation. They are important components in snake venom and may contribute 
to hemorrhage in bitten patients. In order to study these enzymes in detail, for example to screen possible antidotes or 
other treatments, it is desirable to obtain them in purified form. 
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Venom obtained from the snake Pseudechis papuanus was fractionated into several protein peaks by Mono-S cation 
exchange FPLC chromatography. One of these peaks displayed phospholipase A2 activity and this was further fractionated 
into two peaks by hydrophobic interaction FPLC chromatography on phenyl-superose. The column had a volume of 
one ml and was equilibrated at pH 7.2 in 2M (NH4)2SO,. Sample was applied to the column in 2M (NH4)2SOq. 
Phenyl-superose was developed with a gradient of 2-0 M (NH4)2SO, as shown by dashed line (buffer B was 20 mM 
Tris-Cl, pH 7.2 containing no (NH4)2SOxz). Peak 1 contained phospholipase A2 activity and was pure by the criterion of 
electrophoresis. Reproduced from Laing et al. (1995) Characterization of a purified phospholipase A, from the venom 
of the papuan black snake (Pseudechis papuanus). Biochimica et Biophysica Acta, 1250, 137-143, by permission of 
Elsevier. 


Sometimes, it is necessary to introduce some distance be- 
tween the ligand to be immobilized and the stationary phase 
particle itself. This is to avoid steric hindrance by the solid 
phase particle. A good example of this would be where a 
protein to be purified cannot bind to a ligand attached di- 
rectly to the stationary phase because of the geometry of its 


ligand binding site. Some commercially available stationary 
phases are provided in a derivatized form (e.g. AH- and CH- 
agarose; see Figure 2.16). In these, a spacer arm 6-carbon 
atoms long is provided by hexane. At the end of the spacer 
is either an -NH, (AH-agarose) or -COOH (CH-agarose) 
group. Ligand may be attached at these groups by formation 
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Figure 2.15. Activation of agarose for affinity chromatography. Important methods for activation of agarose are; (i) CNBr, (ii) epoxy, (iii) 


carbonyldiimidazole, (iv) dichlorotriazine, (v) trecyl. 


of an amide linkage with ligand -COOH or —NH) groups, 
respectively, in the presence of dicyclohexylcarbodiimide. 
By use of diamines of varying lengths, it is also possible 
in the laboratory routinely to create a range of spacer arms 
of differing lengths using resins activated as shown in Fig- 
ure 2.15 and to couple -COOH groups of ligands at one of 
their terminal -NH, groups with dicyclohexylcarbodiimide 
(Figure 2.16). For immobilized macromolecules (e.g. anti- 
bodies, antigens) use of spacer arms is usually not necessary. 

Since contaminants may sometimes bind to the station- 
ary phase in a non specific and low affinity manner, it is 
best to wash the column with a wash buffer containing 
100-200 mM NaCl after the sample has passed through the 
column. Elution of specifically bound molecules can some- 
times present difficulties due to too great an affinity for the 
immobilized ligand. One approach to eluting such molecules 


is to change the pH or ion strength by applying a stationary 
phase with a different buffer composition. For very strongly 
bound samples, the use of chaotropic reagents such as urea 
may have to be considered. However, the use of harsh con- 
ditions may lead to loss of biological activity of the protein 
to be purified. The likely strength of binding and specificity 
of the molecule to be purified for the immobilized ligand 
are important criteria in the design of the stationary phase 
to be used in this technique. A gentler approach is to elute 
the bound molecule with an excess of the affinity ligand in 
the mobile phase. Soluble and immobilized versions of the 
ligand compete with each other for the binding site on the 
protein, thus gently eluting it from the stationary phase. A 
variation of this approach which is useful in situations where 
closely related proteins (e.g. isoenzymes) have bound to a 
single immobilized ligand is to create an affinity gradient, 
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Figure 2.16. Coupling of affinity ligands. Attachment of -NH3 groups to(i) CNBr-activated resin, (ii) epoxy-activated resin, (iii) 
dichlorotriazine-activated resin, (iv) tresyl-activated resin, (v) 6-aminohexanoic-acid-agarose (CH-agarose). Attachment of -OH groups 
to (vi) carbonyldiimidazole-activated resin. Attachment of -SH groups to (vii) thiopropyl-agarose. Attachment of -COOH groups to (viii) 
1,6-diaminohexane-agarose (AH-agarose). Couplings (v) and (viii) are carried out with dicyclohexylcarbodiimide (DCI) which promotes 
amide bond formation between -COOH and -NH3 groups. (Continued) 
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Figure 2.16. (Continued) 


where the concentration of competing ligand is gradually 
increased in the mobile phase. Since individual isoenzymes 
may differ from each other in affinity for the ligand (e.g. 
isoenzymes may have differing Km’s for their substrates 
or Ki’s for their inhibitors), they may be competed off the 
stationary phase at different points in the gradient. 

A popular application of affinity chromatography is in the 
recovery of proteins expressed in an E. coli expression sys- 
tem as fusion proteins with glutathione transferases (GSTs; 
Example 2.5). The GST component of the fusion binds to 
a glutathione-agarose column and the fusion protein may 
thus be purified from other E. coli proteins. Subsequently, it 
is collected by elution with the affinity ligand, glutathione. 
The expressed protein may be released by proteolytic cleav- 
age with thrombin, and the GST is removed by a second 
passage through the affinity column. 


2.4.6 Immobilized Metal Affinity Chromatography 


A variation on conventional affinity chromatography and 
an alternative approach to the use of GST fusion expres- 
sion systems is immobilized metal affinity chromatography 
(IMAC; also sometimes referred to as metal chelate chro- 
matography). In this technique, metals are immobilized on 
a stationary phase and metal-binding regions of proteins 
bind to this ligand. Metals such as Cu, Zn, Ca, Ni and Fe 
may be conveniently immobilized by chelation to a group 
such as that provided by iminodiacetic acid (IDA) bonded 
to a stationary phase such as agarose or silica. A nitrogen 
and two oxygens in IDA coordinate with the metal leaving 
three sites available for binding to proteins (Figure 2.17). 
Amino acid side chains most likely to coordinate to metals 
are those of histidine, cysteine and tryptophan. A limitation 
of this approach is the fact that the metal may be gradu- 
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ally leached from the stationary phase due to the weakness 
of interaction with IDA. Other groups are now commer- 
cially available which coordinate at four sites on the metal 
thus facilitating more stable binding to the stationary phase. 
Metals may be regarded as group-specific ligands, that is 
a variety of proteins can bind to a single metal. Specificity 
is achieved by varying the elution procedure, especially the 
mobile phase characteristics such as pH, and in the choice 
of gradient. Proteins bind best to IMAC stationary phases at 
neutral to alkaline pH values and in the presence of 0.5-1 M 
NaCl (which avoids nonspecific electrostatic interactions). 
If desired, a given metal may be stripped from the stationary 
phase by treatment with EDTA and replaced with a different 
metal. Elution is usually achieved by either a decreasing pH 
gradient or by displacement with an electron donor such as 
imidazole. In selecting mobile phase buffer components for 
IMAC, molecules containing primary or secondary amines 
should be avoided, especially around neutrality. 

An application of IMAC to purification of proteins ex- 
pressed in bacteria (Example 2.6) is the use of the His-tag 
system, that is the creation of a fusion protein between the 
protein to be purified and a chain of six histidine residues 
(His-tag) that provides a binding site for an IMAC stationary 
phase with four metal coordination sites (see above). Com- 
bination of the use of six histidines and a more stably bound 
metal gives 1000-fold greater specificity of binding than that 
observed with IDA-agarose. As with the GST fusion protein, 
the His-tag may be removed by protease treatment at a cleav- 
age site included in design of the fusion (e.g. enterokinase at 
the N-terminus or carboxypeptidase at the C-terminus with 
an arginine included to terminate digestion). This may be 
followed by a second passage through the IMAC stationary 
phase to remove the His-tag. Often, the tag does not greatly 
affect the properties of the protein to which it is fused, so it 
is not necessary to remove it. 
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Example 2.5 GST fusion protein expression system; an application of affinity chromatography 


To purify a protein of interest, it is usual to identify a tissue or cell-type which is rich in this protein and to use this 
as the starting material for a purification protocol. However, some proteins are normally expressed in cells at very low 
concentration or they may be transiently expressed for short periods of the cell cycle. It may be difficult to purify enough 
of such a protein for structural or functional analysis using the natural site of expression as a starting point. An alternative 
approach to this problem is to clone the gene for the protein of interest and to express the protein in a suitable expression 
system. This experiment also lends itself to the expression of derivatives of the protein such as mutants and chimeras 
(fusion proteins). A major drawback to this approach is the possibility that the cells in which the protein is expressed 
may not be capable of correct post-translational processing. A further problem is the fact that the protein still requires to 
be laboriously purified by column chromatography methods. 
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Purification of a cloned protein from an expression system may be facilitated by constructing a fusion between the 
gene of interest and the gene for a protein known to bind to a specific affinity ligand. A popular choice for this experiment 
is GST since this protein binds specifically and quantitatively to glutathione (GSH)-agarose. 

Protein is expressed in E. coli as a fusion with GST in an appropriate expression vector. In the illustration, this protein 
is represented in white, the GST moiety in grey and the thrombin cleavage site is hatched. 

(1) This fusion selectively binds to the affinity resin GSH-agarose facilitating the removal of background E. coli 
proteins which are simply the proteins normally expressed by the E. coli cell in the absence of the fusion. (2) The fusion 
protein is then eluted by 5mM GSH which competes with the GSH affinity ligand for the GSH binding site of the 
GST moiety. (3) The purified fusion protein is collected and treated with thrombin which releases the GST moiety from 
the fusion. A second passage through the affinity column removes the GST, resulting in pure protein. In principle, this 
approach can be taken to improve the expression/purification of any protein capable of being cloned. 
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Figure 2.17. Immobilization of metal on IDA-agarose. 


2.4.7 Hydroxyapatite 


Hydroxyapatite is a crystalline form of calcium phosphate 
with the chemical formula Caio(PO4)6(OH)2. Usually, this 
material is equilibrated with dilute (e.g. 0.1 mM) phosphate 
buffers and significant numbers of phosphate ions become 
immobilized on the resin as shown in Figure 2.18. It is 
possible for positively charged proteins to bind to the phos- 
phate ions and these may easily be eluted by altering the 
phosphate, salt or divalent metal (e.g. Ca?+ or Mg?) con- 
centrations of the eluting buffer. The latter metals neutral- 
ize the negative charges of the phosphate ions. Negatively 
charged proteins may also bind to hydroxyapatite by inter- 
action of carboxylate side chains with Ca?+ in the resin. 
However, these proteins are also simultaneously electrostat- 
ically repulsed by bound phosphate ions. Elution of these 
latter proteins is by application of a gradient of an ion that 
binds more strongly to Ca?+ such as phosphate or fluoride. 
In practice, hydroxyapatite is first equilibrated with dilute 
phosphate buffer and then washed sequentially with 5 mM 
MgCl, (possibly as a 0-5 mM gradient) to remove basic 
proteins, 1 M NaCl (or a 0-1 M gradient) to remove neutral 
proteins and lastly a gradient of 0.1-0.3mM phosphate to 
elute acidic proteins. 

Because of its difficult handling and flow properties and 
limited capacity, hydroxyapatite is not as popular as some 
of the other modes of chromatography described in this 
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section. However, since it adsorbs and desorbs proteins in 
a unique manner, the technique often allows purification 
of proteins that may have proved difficult to separate by 
the other chromatography modes and provides a valuable 
complement to them. 


2.5 OPEN COLUMN 
CHROMATOGRAPHY 


2.5.1 Equipment Used 


The modes of chromatography described in Section 2.4 may 
be performed in a chromatography system operating at at- 
mospheric pressure, using stationary phase particles with 
relatively large (e.g. 50-150 uM) diameters. Such a system 
is called open column chromatography or low performance 
liquid chromatography. This gives relatively low resolution 
and is slow compared to the more high-resolution systems 
described in the following sections (Sections 2.6-2.9). How- 
ever, there are also a number of compelling advantages to 
the use of open column chromatography. Stationary phases 
and other equipment used are cheap making this approach 
especially convenient for large-scale preparative chromatog- 
raphy. This lends itself to steps used early in protein purifi- 
cation procedures (Section 2.10.1), where a priority is the 
decrease of total protein in the extract to facilitate subse- 
quent application to high-resolution systems. 

Columns and tubing used at low pressure may be con- 
structed out of cheap and readily available materials. Gener- 
ally, plastic reservoirs and tubing are used for mobile phase. 
Stationary phases are retained in glass or plastic columns ei- 
ther by a high porosity glass sinter or else by placing plastic 
wool at the bottom of the column. 

A degree of standardization of flow rate is possible with 
a peristaltic pump although flow rates achieved will be lim- 
ited by the hydrodynamic properties of the stationary phase. 
Since this is generally composed of polysaccharide-based 
gels that are quite soft, they are limited in the hydrostatic 
pressures which they can accommodate without compres- 
sion. Most systems flow adequately under gravity although 
the flow rate may vary somewhat during chromatography 
as mobile phase level in the reservoir drops (Figure 2.19). 
When using gravity flow, a safety loop is usually included in 
the system design. The lowest point of the loop carrying flow 
to the column is deliberately placed lower than the outflow 
from the column. If the mobile phase reservoir becomes un- 
expectedly exhausted during the experiment, flow will cease 
as soon as mobile phase reaches the lowest point of the loop. 
This protects the gel bed from accidentally drying out. 

An on-line ultraviolet detector gives a continuous trace 
of protein elution and use of a fraction collector allows 
peak collection. It is also possible to measure ultraviolet 
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Example 2.6 IMAC chromatography of a metal-binding protein using His-TAG 


An alternative, though similar approach to that described in Example 2.5 is the use of immobilized metal affinity 
chromatography (IMAC) which takes advantage of metals immobilized on an affinity matrix. A fusion of the gene coding 
for the protein to be purified (white) with a sequence of six histidine residues (grey) called a his-tag is constructed 
which contains an enterokinase cleavage site (hatched) as shown in the illustration. This fusion is expressed in an E. coli 
expression system. 
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(1) When proteins from such an expression system are applied to the IMAC column, the background (i.e. E. coli) 
proteins simply pass straight through the column. The N-terminal (histidine), tag recognizes metal immobilized on the 
IMAC stationary phase and binds allowing convenient, one-step removal of untagged contaminants. (2) The purified 
fusion is collected by varying the pH of the elution buffer (usually making it more acidic). Note that stability to a range 
of pH is therefore essential for proteins to be purified by this method. (3) The his-tag may be removed, if necessary, 
by treatment with enterokinase followed by a second passage through the IMAC column. Frequently, the presence of a 
his-tag has no structural or functional effect on the protein so it is often left in place. 

The protease cleavage sites used in this type of experiment are deliberately chosen because they occur rarely in proteins 
thus avoiding inadvertent proteolytic cleavage of the protein to be purified. 


absorbance after separation on a bench spectrophotometer 
and to plot the elution profile manually. Since this chro- 
matography can be quite slow (hours to days), it is usually 
necessary to carry it out either in a cold-room or a refriger- 
ated cabinet (4 °C). 


Depending on the mode of chromatography to be em- 
ployed, some attention should be given to column design. 
For gel filtration chromatography, long, narrow columns are 
necessary to maximize resolution whereas for most types of 
adsorption chromatography, shorter and wider columns may 
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Figure 2.18. Hydroxyapatite chromatography. Ca?+ ions of hy- 
droxyapatite attract phosphate ions from buffer (small circles). 
These, in turn, electrostatically attract positively-charged proteins. 
Negatively-charged proteins must compete with phosphate to bind 
at Ca?+, overcoming electrostatic repulsion. Development of the 
column is as described in text. 


be used. There is, however, a limit to the length of column it 
is possible to use in gel filtration chromatography since, as 
mentioned above, soft gels may be subject to compression 
if exposed to large hydrostatic pressures which will in turn 
lead to decreased flow rates. 
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Figure 2.19. Decrease in flow-rate as a result of drop in mobile 
phase level in reservoir. Hydrostatic pressure across the column 
during chromatography is represented by arrows. 
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2.5.2 Industrial Scale Chromatography of Proteins 


The cheapness and large capacity of open column chro- 
matography systems make them especially appropriate to 
industrial scale protein purification. Modern production of 
proteins for pharmaceutical use is extensively regulated by 
agencies such as the US Food and Drug Administration 
(FDA). Protein drugs are amongst the most expensive to pro- 
duce industrially and there has been a remarkable increase in 
FDA approvals for protein drugs in recent years (21 products 
approved in 2005 alone and 92 in the period 2000-2005). 
Other important applications of proteins include diagnos- 
tics, use as laboratory reagents and as additives in food and 
brewing. These latter proteins may have significantly less 
demanding purity requirements which make them relatively 
cheaper to produce. Expression of proteins of interest in 
a wide range of heterologous systems (e.g. expression of 
human genes in microbial expression systems) is also now 
widespread. 

As a result of these developments, production of pro- 
teins has become of major importance to the pharmaceutical 
and chemical industries in recent years. This has led to the 
development of a new area of activity called downstream 
processing of proteins. Most proteins currently produced 
commercially, arise either from microbial fermentations or 
from animal cell culture. Major limitations in this activity 
are the handling of large volumes, stability of the protein to 
pH and heat, incorrect folding and other post-translational 
processing of the protein (as a result of expression in a 
heterologous system). Inert, stainless steel containers are 
used instead of glass vessels for the storage and handling of 
the much larger volumes of extracts and buffers required. 
These may be sterilized and are robust enough for pro- 
cess work. Extracts and buffers used in purification are nor- 
mally pumped through stainless steel or plastic piping rather 
than manually transferred as in the laboratory scale purifi- 
cation. These pipes also are amenable to sterilization and 
cleaning. 

In general, the same overall techniques described in Sec- 
tion 2.4 are used but the scale of the column is much larger. 
Commercially available columns are constructed mainly 
from reinforced plastic or steel and can accommodate many 
liters of stationary phase. Usually, industrial scale columns 
retain similar length to those used in the laboratory scale 
but the column diameter is increased greatly. By stacking 
columns one on top of the other, it is possible to put in place 
hundreds of liters of stationary phase without extensive bed 
compression. An alternative column design is offered by ra- 
dial flow chromatography, where sample flows across rather 
than down the length of the column. This design facilitates 
higher flow rates than conventional columns of the same vol- 
ume. Sample enters the column from the periphery, flows 
across the stationary phase and then flows into a channel 
running through the centre of the column. 
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Automated systems for control of downstream process- 
ing such as BioPilot are now commercially available. In 
developing an industrial scale separation, it is usual to be- 
gin with the laboratory scale and then to deal with effects 
of scale-up with a pilot scale purification (an intermediate 
scale between laboratory scale and industrial scale). Only 
once these stages have been completed does industrial scale 
purification become a realistic proposition. As each stage of 
purification is developed and scale increases, the financial 
cost of purifying each unit of protein decreases significantly 
due to economies of scale. A major determinant in this cost 
is the concentration of protein in the original extract. Con- 
siderable thought should therefore be given to choice of 
expression system used for industrial scale purification. 


2.6 HIGH PERFORMANCE LIQUID 
CHROMATOGRAPHY (HPLC) 


2.6.1 Equipment Used 


A major contributor to the overall plate height equation is 
stagnant mobile phase mass transfer, H3 (see Section 2.2.4 
and Equation (2.8)). This represents diffusion of sample 
components into and out of the stagnant mobile phase of 
the stationary phase. This term is directly proportional to 
the square of particle diameter (d>). It is to be expected, 
therefore, that smaller values of d, would lead to large re- 
ductions in H and much improved chromatographic perfor- 
mance. One physical reason for this is that the surface area 
available for adsorption/desorption in such particles is pro- 
portionately much greater than in larger particles. This is 
why the theoretical plate number, N, is directly related to 
the surface area of the stationary phase particle. Adsorption/ 
desorption kinetics and consequent chromatographic perfor- 
mance are, therefore, expected to be significantly improved 
in stationary phases with small values of dp. 

Smaller diameter particles also result in greatly reduced 
flow rates, however, since they give greater resistance to mo- 
bile phase flow. This results in back pressure on the station- 
ary phase which, in turn, could cause serious band broaden- 
ing and long retention times due to distortion of the station- 
ary phase bed. Back pressure increases in inverse proportion 
to d? In open column chromatography, the stationary phase 
particles used (Section 2.5) are made up mainly of polysac- 
charide gels which are mechanically weak. Therefore, even 
if such particles were produced with small diameters, they 
would not be sufficiently strong to withstand the high pres- 
sures required to achieve high resolution chromatography. 
Silica-based particles (diameter 5-10 um), by contrast, al- 
low the use of high flow rates achieved by actively pumping 
mobile phase through the stationary phase under high pres- 
sure (up to 55 MPa). This is called high performance liquid 
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Figure 2.20. Overview of typical HPLC system. Low-pressure re- 
gion of the system where tubing may be plastic is denoted by dashed 


line. Solid lines denote stainless steel tubing required due to high 
pressure. 


chromatography (HPLC). A combination of high pressure 
with small particle diameter achieves high resolution in this 
type of chromatography format. The high pressure required 
by the system dominates design and construction of equip- 
ment used (Figure 2.20). In practice, below d, values of 
3-5 um the pressure required for flow in HPLC systems 
becomes impractical. 

Sample is injected through a neoprene/Teflon septum 
from a syringe which may be operated manually or au- 
tomatically from an autosampler. The latter system is espe- 
cially appropriate for analytical applications where multiple, 
repetitive analyses may be required. 

In order to achieve a stable baseline in elution profiles, it 
is important that flow of mobile phase be very stable (i.e. 
pulse-free). In HPLC, either constant volume (also known 
as constant displacement) or constant pressure pumps are 
used to achieve pulse-free flow of mobile phase. The 


latter maintain a constant pressure regardless of changes 
in stationary phase resistance to flow during chromatogra- 
phy. If this resistance increases, however, then the flow rate 
decreases automatically to maintain constant pressure. Con- 
stant volume pumps are more popular in HPLC, because 
they maintain a constant flow rate through the stationary 
phase during chromatography, increasing pressure if nec- 
essary, to overcome increased stationary phase resistance. 
The most common type of constant volume pump used is 
the reciprocating pump which uses a piston to deliver a fixed 
volume of solvent onto the column in repeated cycles of fill- 
ing and emptying. These pumps can introduce pulses into 
the flow but pulse dampeners are built into them to minimize 
this. 

In most HPLC chromatography, the mobile phase consists 
of a mixture of two or more components (A and B; e.g. 
acetonitrile and water) applied to the column to achieve 
rapid separation. The HPLC instrument therefore contains 
a mixer for efficient mixing of A and B. This may be of two 
general types. Dynamic mixers stir A and B together but 
have the disadvantage of a large dead volume. Static mixers, 
by contrast, mix A and B by an eddy diffusion process and 
normally have small dead volumes. 

Mobile phase is pumped around the system through stain- 
less steel tubing which is required as a result of the high pres- 
sure used. The column in which stationary phase is packed is 
also of stainless steel construction with internal diameters of 
2-5 mm for analytical separations and lengths of 5 to 30 cm. 
However, industrial-scale preparative HPLC columns may 
have dimensions of up to 15 x 100cm. Porous plugs of 
Teflon or stainless steel retain the stationary phase in the 
column. 

A variety of forms of detection are used in HPLC. Variable 
wavelength UV detection is especially popular in analysis of 
biochemical samples. For proteins, detection may be carried 
out at 280 nm (which is specific for aromatic amino acids) or 
220 nm (specific for peptide bonds) while 260 nm is useful 
for detection of nucleic acids. Other types of detectors are 
fluorescence and refractive index detectors. An example of a 
specialized approach to detection is postcolumn derivatiza- 
tion which may be used, for example, in amino acid analysis. 
Amino acids are separated chromatographically and then are 
derivatized with o-phthalaldehyde (OPA) postcolumn. The 
OPA conjugates formed are strongly fluorescent and may be 
detected with high sensitivity. 

The dimensions of HPLC columns can vary widely. 
A traditional column with an inner diameter of 2-5 mm 
might have a flow-rate range of approximately 1—10 ml/min. 
However, increasingly finer columns are being used in 
high-resolution separations with corresponding reduction 
in achievable flow-rates: Capillary columns have inner 
diameters in the range 100 um-1 mm (flow-rate range; 
0.4—200 pl/min); Nanobore columns have inner diameters 


CHROMATOGRAPHY 41 


in the range 25-100 uM (flow rate range; 25-4000 nl min). 
Nanobore HPLC achieves very stable submicroliter flow- 
rates which has facilitated interfacing with other analytical 
methods such as mass spectrometry. These are especially im- 
portant and useful in situations where only minute quantities 
of protein are available for analysis such as in proteomics 
(Chapter 10). 


2.6.2 Stationary Phases in HPLC 


The particles of which the stationary phase is composed 
have a rigid, solid structure rather than that of a soft gel such 
as is used in open column chromatography. Three main 
types of particle are used (Figure 2.21). Microporous parti- 
cles (5-10 uM) contain microscopic pores which run right 
through the particle, giving a large available surface area, 
and are often composed of silica or aluminium. Pellicu- 
lar (also known as superficially porous) particles (40 um) 
are also porous but these are coated on an inert core com- 
posed of glass or some other hard material. Porous sta- 
tionary phases especially useful for proteins and peptides 
may be composed of polymethacrylates or poly (styrene- 
divinylbenzene). These display excellent pH-stability 
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Figure 2.21. Stationary phase particles used in HPLC: (a) Micro- 
porous, (b) Pellicular and (c) Bonded phases. 
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Figure 2.22. HPLC bonded phases. (a) Silicate esters, (b) Silica- 
carbon and (c) Siloxanes. 


properties (from pH 2-12) and are mechanically very stable. 
The third type of particles are bonded phases in which the 
stationary phase is chemically bonded to an inert support 
such as silica (Figure 2.22). A disadvantage of silica-based 
stationary phases is that they can sometimes give cation ex- 
change effects and that they have limited stability above pH 


Table 2.5. Selected HPLC stationary phases 


9 and below pH 2. A selection of some of the more popular 
HPLC stationary phases is given in Table 2.5. 

Stationary phase particles are packed with a high pres- 
sure slurrying procedure which involves forming a slurry of 
stationary phase in a solvent of equal density and pumping it 
into the column under high pressure. This produces a dense, 
continuous bed of stationary phase which, ideally, is free of 
cracks and other imperfections. 


2.6.3 Liquid Phases in HPLC 


It is essential that the mobile phase used in HPLC be of 
high purity and be chemically unreactive. Under elevated 
pressure, air or gas in solvents can form bubbles which 
could damage stationary phase packing and interfere with 
chromatographic analysis. It is usually necessary to degas 
the solvents before chromatography, either by purging (also 
sometimes called sparging) with gas or by exposing the 
solvent to a vacuum. To avoid fouling or blocking of columns 
by particulate matter suspended in sample and solvents used, 
it is also usual to filter these through a 5 um filter and to 
introduce a guard column into the system itself before the 
chromatography column (Figure 2.20). 

A variety of mobile phase solvents may be used in HPLC. 
Choice of these is often a matter of experiment but they can 
be organized into a list reflecting their ability to displace 
adsorbed sample components called an elutropic series 
(Table 2.6). The value of eluent strength increases, gener- 
ally, with increase in polarity. A further important factor in 
choosing a mobile phase solvent is that it should not interfere 
with detection of sample components. Some solvents shown 
in Table 2.6 display strong UV absorbance and allowance 
should be made for this if using this type of detection. 


2.6.4 Two Dimensional HPLC 


It is often not possible to achieve a complete separa- 
tion of a biochemical extract in a single chromatogram. 
This is because there may be close similarity between 
components in terms of polarity, molecular mass, charge, 


Stationary phase Description Application 
Silica C-187 Bonded phase Reversed-phase chromatography of peptides 
Silica C-18° Bonded phase Reversed-phase chromatography of proteins 


Fractogel TSK HW-55° 
CM/DEAE-silica® 
Fractogel TSK HW-65-ligand@ 


Bonded phase 


Porous poly vinyl chloride 


Porous polyvinyl chloride 


Gel filtration of proteins 
Ion exchange chromatography 
Affinity chromatography 


with immobilised ligand 


“For details of chemical bonding see Figure 2.22. 
>See also Table 2.3. 


“For chemical structure of ion exchange groups see Table 2.2. 
4See Figure 2.16 for details of ligand immobilisation chemistry. 


Table 2.6. Elutropic series of HPLC solvents 


Solvent Eluent strength 
n-Pentane 0 
n-Hexane 0.01 
Cyclohexane 0.04 
Carbon tetrachloride 0.18 
2-Chloropropane 0.29 
Toluene 0.29 
Ethyl ether 0.38 
Chloroform 0.4 
Tetrahydrofuran 0.45 
Acetone 0.56 
Ethyl acetate 0.58 
Dimethylsulphoxide 0.62 
Acetonitrile 0.65 
2-Propanol 0.82 
Ethanol 0.88 
Methanol 0.95 
Ethylene glycol 1.11 


pl, and so on. However, if separation based on one physico- 
chemical parameter (e.g. charge characteristics) is followed 
by separation based on a second parameter (e.g. molecular 
mass), extremely high resolution can be achieved. This is the 
concept underlying two dimensional HPLC (2-D HPLC). 
Such multidimensional separations have a very large peak 
capacity (total number of peaks theoretically distinguish- 
able) as follows: 

Pup = Pı x P, x P3... (2.14) 
Where Pmp, Pi, P2... are the peak capacities of multidi- 
mensional, first dimension, second dimension, and so on 
separations. Common pairings of dimensions include ion 
exchange — reversed phase and gel filtration-capillary elec- 
trophoresis. As a fraction elutes from dimension 1, it is 
automatically passed to dimension 2. In this way extremely 
high resolution separations of complex mixtures of peptides 
or proteins are possible. As well as coupled columns, single 
columns containing two types of packing are possible. An 
example is an ion exchanger mixed with a reversed phase 
stationary phase. 2-D HPLC has potential in the field of 
proteomics where it complements two dimensional elec- 
trophoresis (Chapter 10). 


2.7 FAST PROTEIN LIQUID 
CHROMATOGRAPHY 


2.7.1 Equipment Used 


The high resolution and impressive versatility of HPLC have 
made it a widespread technique in biochemistry, analytical 
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and organic chemistry and many other areas of scientific 
investigation. It is especially useful in separation and analy- 
sis of low molecular weight biomolecules such as metabolic 
intermediates, building blocks of biopolymers and in the 
analysis of peptide mixtures. Although principally used in 
analytical applications, the technique may also be applied 
to the preparation of pure sample components, up to and 
including industrial scale, provided columns with suitable 
capacities are available. It is fair to say, however, that the 
technique does not usually lend itself readily to the purifica- 
tion of complex biopolymers such as proteins in their active 
form. There are a number of reasons for this. Firstly, some 
modes of chromatography commonly performed in a HPLC 
format (e.g. reversed-phase chromatography) require the use 
of organic solvents which will denature proteins. Secondly, 
although large capacity HPLC columns are available, they 
are very expensive and small capacity analytical columns 
are much more common. These latter columns place se- 
vere limitations on loadings achievable in HPLC (up to a 
maximum of approx. 200 ug, depending on the column). 

Pharmacia Biotech (currently GE Healthcare) pioneered 
Fast Protein Liquid Chromatography (FPLC) to address the 
particular needs of active biopolymer purification. Although 
this chromatography format (Figure 2.23) was originally de- 
veloped for protein purification, it has since been used for 
the purification of a wide variety of other biomolecules in- 
cluding DNA (plasmids, restriction fragments and oligonu- 
cleotides), nucleotide derivatives (e.g. NAD, ATP), carbo- 
hydrates as well as peptides and other low molecular weight 
biomolecules. 

The system uses piston-driven, reciprocating pumps. 
Each contains a pair of pistons operating in opposite direc- 
tions (one filling as the other empties) which avoids pulse 
formation. Because the pressure of the system is signifi- 
cantly lower than that of HPLC (approx. 1-2 MPa), compo- 
nents are constructed from glass and plastic rather than of 
steel as in HPLC. Sample is loaded by syringe into a loading 
loop (25-500 ul) or by a peristaltic pump into a superloop 
(10 ml). The contents of these loops are then pumped onto 
the column. Stationary phase is provided in prepacked high 
capacity (e.g. up to 20 mg per ml stationary phase) columns 
including superose and superdex (crosslinked agarose; av- 
erage dp, 13 um) for gel filtration, affinity and IMAC, sil- 
ica with bonded phases for reversed phase (d,, 5 um) and 
mono beads (a hydrophilic polyether) with bonded ion ex- 
change and chromatofocusing groups (dp, 10 um). These 
phases are highly compatible with aqueous buffers and salts 
commonly used in bioseparation. Automated valves and 
a modular design allow convenient assembly of the vari- 
ous components of the system. A program controller which 
may be further interfaced with a personal computer and a 
range of dedicated software options allow a very high de- 
gree of automation which facilitates rapid and reproducible 
chromatography. 
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Figure 2.23. Outline of FPLC format for high resolution chro- 
matography (adapted from Doonan, 1996). Small sample volumes 
(25-500 ul) may be loaded from a fixed volume sample-loop while 
larger volumes (several ml) may be loaded from a superloop via the 
arrangement shown. 


2.7.2 Comparison with HPLC 


This chromatography format has a number of similarities to 
HPLC; small particle diameter stationary phases are used 
(5-13 um); it is available in the various chromatography 
modes described in Section 2.4; mobile phase flow takes 
place in a pressurized system; sample and buffer compo- 
nents require filtration before chromatography. Important 
differences with HPLC, though, arise from the fact that this 
chromatography format was designed specifically with the 
needs of biopolymers in mind. Mainly aqueous solvents are 
used and the stationary phases have a relatively large load- 
ing capacity. Moreover, the system operates at much lower 
pressures than HPLC which simplifies the assembly and re- 
arrangement of components for particular purposes as leaks 
in the system are less common than in HPLC. 


2.8 PERFUSION CHROMATOGRAPHY 


2.8.1 Theory of Perfusion Chromatography 


Mass transfer effects within stationary phase particles are 
major contributors to band broadening in liquid chromatog- 


raphy. This is decreased by reduction of particle size but, 
as pointed out (Section 2.6), a lower limit of 3—5 um is im- 
posed by practical limitations of pressure on the system. 
Perfusion chromatography provides an alternative means 
of addressing this problem and combines unusually high 
flow rates with a novel type of stationary phase particle 
which allows extremely rapid separation. In order to un- 
derstand how this comes about, we need to review some 
aspects of the basic theory of chromatography described in 
Section 2.2. 

Mass transfer effects were given as H and H; in Equa- 
tions (2.8) and (2.9) and these equations may be combined 
into a simplified overall plate height equation as follows: 


1 2 
Be -dgy 
Dm 


H (2.15) 


where C’ represents the sum of Csm and Cm. It is clear from 
this that H increases with flow rate, thus giving increased 
band broadening at high flow rates. It is expected from the 
Van Deemter equation (Equation (2.11); Figure 2.6) that, at 
very high flow rates (in the range 1000-10000 cm h7'), the 
mass transfer effects term would dominate such that: 


H~C-v (2.16) 


This is because, at large v values, the product of C and v 
would also be large while the contribution of eddy diffusion 
is generally modest in liquid chromatography and that for 
longitudinal diffusion (B/v) would decline with increasing 
flow rate. 

The diffusion coefficient in the mobile phase, Dm, can be 
divided into diffusive and convective components such that: 


Vpore * dp 


Dm = D+ 217) 


where D is the diffusion coefficient and Vpore the flow rate 
in the pores of the stationary phase particle. High resolution 
chromatography formats such as HPLC and FPLC normally 
use v ~ 400 cm h™! and, in such systems, diffusive transport 
dominates. However, at high values of v, the convective term 
may become much more important such that: 


Vpore * dp 


Dn ~ 
2 


(2.18) 


There is a proportional relationship between vpore and v: 


(2.19) 


Vpore = Const. - v 


We can, therefore, insert Equation (2.19) into ((2.15): 


2. d? -v 
H = —— (2.20) 
const. v - dp 
2-d, 
=? (2.21) 
const. 


At very high values of v, H becomes largely indepen- 
dent of v and intraparticle diffusion becomes unimportant 
relative to intraparticle convection for protein samples. If it 
were possible in practice to achieve such high flow rates of 
mobile phase through stationary phase particles with small 
values of dp, then it would be possible to achieve high res- 
olution separations (low values of H) in extremely short 
times (as a consequence of high flow rates) which would be 
independent of flow rate. 

Perfusion chromatography allows us to realize this objec- 
tive in practice using perfusion. This involves a stationary 
phase particle which permits the use of very high flow rates 
with little back-pressure. Perfusive particles contain large 
(~1 um) channels through which rapid diffusion of mobile 
phase is facilitated (Figure 2.24). At the flow rates used 
in perfusion chromatography, convective flow is some ten 
times greater than diffusion. These channels are also some- 
times called perfusive pores or through-pores. Separations 
obtained with this stationary phase agree well with the the- 
oretical considerations described above. 


2.8.2 Practice of Perfusion Chromatography 


Particles used in perfusion chromatography are called 
POROS and are composed of poly (styrene-divinylbenzene). 
They have small dp, (10, 20 and 50 um) and the stationary 
phase (which can contain ion exchange, reversed phase, 
hydrophobic interaction, IMAC and other affinity groups; 
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chromatography 
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Figure 2.24. A Perfusion chromatography particle. Mobile phase 
flow is denoted by arrows in a conventional and perfusion chro- 
matography particle. Note improvement in mobile phase flow due 
to perfusive channels. 
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Table 2.7) is applied as a thin film to this support. Two types 
of pores transect this particle. Very large through pores 
(6000-8000 Å) are responsible for rapid flow through the 
particle. They contain a very small amount of the overall 
surface area of the particle, however. Diffusive pores are 
much smaller (500-1000 Å) and connect the through-pores. 
In these latter pores, which are very short (<1 um), diffu- 
sion is more important than convective flow and their surface 
represents the bulk of the particle surface-area. This porous 
network with its large surface area contributes to the very 
high capacity of POROS particles. 

Perfusion chromatography may be performed in columns 
packed with POROS particles in open column, HPLC or 
FPLC chromatography systems. Separation of proteins in 
as little as 15—60s is possible because of the high flow 
rates achievable with this stationary phase. An interesting 
adaptation of the procedure is immunodetection, where the 
steps of an ELISA assay are performed extremely rapidly in 
a flow-through column format using immobilized antibodies 
to a molecule of interest. 


2.9 MEMBRANE-BASED 
CHROMATOGRAPHY SYSTEMS 


2.9.1 Theoretical Basis 


So far, we have focused on chromatography systems which 
use stationary phases made up of particles with diameter, 
dp. This parameter is crucial to achieving high resolution 
separations but small d, values lead to a requirement for 
high pressure chromatography. Moreover, a lower limit of 
3-5 um is placed on dp as a result of impracticably high back 
pressures below this value. An alternative approach is the use 
of membrane-based rather than particle-based stationary 
phases. In membrane-based chromatography, the stationary 
phase is composed of thin, porous membranes which are 
layered on top of each other. These may be coated with 
ion exchange, affinity or other groups, providing a range of 
chromatography modes. The membranes are contained in a 
cartridge which may be used in place of a column in open 
column, FPLC and HPLC formats. 

When a sample is applied to such a system, it is found 
that diffusion does not impose a limit on mass transfer to 
the stationary phase. This process seems to occur largely 
by convection resulting in much faster adsorption kinetics 
(e.g. 200-300-times faster than in agarose-based systems). 
Moreover, by layering many membranes on top of each 
other (to a thickness of ~10 mm) and by increasing mem- 
brane diameter, it is possible to achieve high capacities in 
the cartridge (15-30 mg protein). It is also possible to use 
very high flow rates (up to 10 ml/min in routine use) without 
generating high back pressures. The reason for these unusual 
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Table 2.7. Principal types of Perfusion Chromatography stationary phases 


Resin Chemical group Use 
HQ Quaternised Strong anion exchanger useful for preparative applications 
polyethylenciminc 
QE Quatcrnised Strong anion exchanger as alternative lo POROS HQ 
polyetltyleneimine 
DEAE Diethylaminoethy] Weak anion exchanger (pH 3-9) 
PI Polyethyieneimine Weak anion exchangere (pH 3-9) 
HS Sulphopropyl Strong cation exchanger useful for preparative applications 
SP Sulphopropyl Strong cation exchanger as alternative to POROS HS 
S Suiphoethy] Strong cation exchanger useful for separation of very hydrophobic 
and/or basic proteins/peptides 
CM Carboxymethyl Weak cation exchanger (pH range 3-8) 
HP2 High-density phenyl Hydrophobic interaction resin useful for weakly hydrophobic 
proteins 
PE Phenyl ether Hydrophobic interaction (moderately hydrophobic proteins 
ET Ethyl ether Hydrophobic interaction resin useful for strongly hydrophobic 
proteins 
OH Hydroxyl Can be activated for affinity chromalography 
AL Aldehyde Preactivated for coupling of primary amines 
EP Epoxide Preactivated for coupling primary/secondary amines —SH, or 
—OH groups 
NH Primary amine Pre-activated for coupling proteins sugars, and ligands with poor 
stability under reducing conditions 
HY Hydrazide Preactivated for site-specific immobilization of IgG 
A Recombinam protein A Isolation of IgG especially human and humanised antibodies 
G Recornhinaw protein A Isolation of mouse IgG 
HE Heparin Purification of coagulation proteins DNA-binding proteins and 
lipoproteins 
MC Imido-diacetate Binding of metals (IMAC; his-tagging) 
RI Base poly(styrene divinyl Reversed-phase chromatography of very hydrophobic proteins or 
benzene) peplides 
R2 Base poly(slyrene divinyl Reverscd-phasc chromatography of hydrophobic proteins or 
benzene) peptides 
Poros/yme Trypsin, endoproleinase Rapid digestion of proteins lo produce peptides 
glu-C, pepsin and pa pain 
ImmunoDelection In addition to EP. AL A, G 
Cartridges: and HE ehcmisiries (see 
above). 
BA Streptavidin Noncovalem binding of biotinylated ligands 
XL Protein G Adsorption of anitbody followed by covalent crosslinking 


observations seems to be that there is a broad pore size dis- 
tribution in the membrane which is responsible for high flow 
rates while large transport pores contribute to a high surface 
area available for adsorption within the membrane. In these 
respects, the system resembles (particle-based) perfusion 
chromatography (Section 2.8). 


2.9.2 Applications of Membrane-Based Separations 


A number of cartridges suitable for membrane-based chro- 
matography are commercially available (Figure 2.25). These 
cover the principal modes of chromatography described in 
Section 2.4 with the exception of gel filtration. Affinity ap- 


plications are especially popular in membrane chromatogra- 
phy with antibodies, metals and other ligands being immo- 
bilized to the membrane essentially as described in Sections 
2.4.5 and 2.4.6. A feature of membrane-based affinity chro- 
matography is that it appears to give significantly higher 
efficiency of ligand utilization than agarose-based systems. 
This is due primarily to the greater degree of convection 
allowing more access to immobilized ligands. A further 
feature of membrane-based systems is that they are not as 
limited in their geometry as are column-based systems. Fil- 
tration systems are available in a wide variety of designs, 
unlike chromatography columns. It is possible to imagine 
downstream processing, for example, in a membrane-based 


(a) Feed in 
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Figure 2.25. Membrane filtration chromatography. (a) Conven- 
tional membrane design, (b) Cross-flow hollow fiber filtration unit. 
Note that, in the cross-flow filtration module, proteins must first pass 
through a semipermeable membrane to reach affinity groups which 
may be attached to gel beads thus ensuring filtration of sample to 
be purified. 


system which would not be possible in a column due to lim- 
itations imposed by column dimensions, particle properties 
and their effects on flow rate, back-pressure and resolution. 
An example of the type of unusual design formats available 
would include cross-flow filtration (Figure 2.25). 


2.10 CHROMATOGRAPHY OF A 
SAMPLE PROTEIN 


2.10.1 Designing a Purification Protocol 


The combination of steps used in purification of a particular 
protein is called a purification protocol. Protein purification 
is often a matter of balancing variables which may be inter- 
related in a very complex way. The most important of these 
are: 


° Stability properties of the target protein. Proteins differ 
considerably from each other in terms of their stability 
to temperature, pH and oxidation. Either when removed 
from the cell or else when expressed in the alien envi- 
ronment of a heterologous expression system, the protein 
may denature, losing its structural integrity and associ- 


e 
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ated biological activity. This may not matter if only a 
small sample of protein is required for structural studies 
such as protein sequencing. Usually, though, structurally 
intact, active protein is required. This sensitivity of pro- 
teins can be taken advantage of in the purification protocol. 
Where proteins display unusual stability to heat and pH, it 
is possible to use heat/pH treatments selectively to dena- 
ture contaminant proteins without affecting the protein of 
interest. Inadvertent release of proteases from subcellular 
organelles during preparation of cell extracts is a com- 
plication associated with cell disruption (e.g. release of 
cathepsins from lysosomes) resulting in proteolytic degra- 
dation of the protein of interest. This may be prevented 
by the inclusion of protease inhibitors in homogenization 
and chromatography buffers. An example is the inclu- 
sion of the metal chelator ethylenediaminetetraacetic acid 
(EDTA) to minimize metalloproteinase activity. 
Concentration. The intracellular protein concentration is 
of the order of 200 mg/ml. When extracting proteins from 
cells, we dilute the protein concentration (often to less 
than 1 mg/ml) and this can affect the structural stability 
of proteins. Since protein purification is a process of re- 
taining the activity of interest whilst decreasing protein 
concentration, this can mean that biological activity may 
decrease during purification. We can detect this as a de- 
crease of specific activity (activity/mg protein) from step 
to step. This may be minimized by addition of a stabilizer 
to the extract such as bovine serum albumin which can 
keep the protein concentration high enough to maintain 
activity. 

Expression system. When using heterologous expression 
systems expression levels in the range 5—20% of total cell 
protein can be achieved. This may prove toxic to the cell 
and the cell may lack enzymes or other proteins neces- 
sary for correct post-translational processing. As a result, 
the protein may be expressed as insoluble plaques or in- 
clusion bodies (also sometimes called refractile bodies). 
Since these have an unusually high density, they may be 
separated from the rest of the cell proteins by centrifuga- 
tion (Section 7.3). Subsequently, plaques can be dissolved 
with the addition of urea and the protein refolded cor- 
rectly. In this way, inappropriate expression of protein in 
plaques can be included in the purification protocol. A fur- 
ther option offered by heterologous expression systems is 
the possibility of expressing the protein as a fusion includ- 
ing polypeptides capable of recognizing affinity-ligands 
such as glutathione or immobilized metals (Examples 2.5 
and 2.6). Inclusion of a protease cleavage site allows the 
subsequent removal of the affinity site from the purified 
fusion. 

Inadvertent removal of cofactors or proteins essential 
for activity. If there is a reduction in specific activ- 
ity during purification, consideration should be given to 
the possibility of the inadvertent removal of low M, 
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cofactors such as metals or coenzymes. This can be 
checked by adding candidate cofactors to highly purified 
preparations. It is also possible that a protein may require 
the presence of other proteins in a multi-protein complex 
for full activity. These other proteins also may be inadver- 
tently removed during purification, leading to an apparent 
decrease in specific activity. This finding may provide the 
first direct evidence for the existence of a protein complex. 
Intrinsic chemical and biological properties. A wide va- 
riety of chromatography modes are available to us which 
take advantage of different chemical and biological prop- 
erties of proteins (Section 2.4). The strategy adopted in 
the design of a purification protocol, is to take advantage 
of small differences in such properties to maximize the 
yield of the protein of interest while minimizing the levels 
of other proteins at each step. 

Protocol design. Most chromatography modes are avail- 
able in a variety of chromatography formats which, as we 
have seen, may be of two general types: high-capacity and 
low resolution such as in open column chromatography or 
low-capacity and high resolution as in high performance 
chromatography methods (Sections 2.5—2.10). In design- 
ing a purification protocol, we take advantage of the matrix 
of chromatography modes and formats available to us to 
minimize the number of steps required for purification, to 
maximize the purity of the final product and to retain the 
protein in its native state, thus preserving its biological 
properties. A key factor in achieving this is the order in 
which the steps are carried out. 


e 


e 


Early steps of the protocol often involve decreasing the 
protein concentration of the extract to allow the use, sub- 
sequently, of high resolution methods. Open column chro- 
matography methods are appropriate for these early steps. 
Once total protein has been reduced to milligram amounts, 
more high performance methods may be used where ad- 
vantage is taken of high resolution separation. It is wise 
to perfect each step by assessing a variety of conditions to 
maximize the purification factor achieved. For example, in 
ion exchange chromatography, it is wise to carry out trial 
separations with a range of gradients and at a variety of pH 
values. In affinity chromatography, a variety of ligands and 
elution techniques might be assessed. It may also be advan- 
tageous to study the effect of altering the precise order of 
the chromatography steps to maximize biological activity 
and purity of the protein of interest. 

Many of the methods described in this book can be used 
to assess the purity of the protein at each step of the pu- 
rification protocol. Of particular importance in this regard 
are analytical chromatography and electrophoretic methods 
(Chapter 5). The number and nature (e.g. M,, hydrophobic- 
ity) of contaminants at each step of the protocol is assessed. 
These findings facilitate taking advantage of differences in 
biophysical properties between the protein of interest and 
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Figure 2.26. Possible purification protocols for separation of rat 
liver cytosolic GSTs. Note how the order in which steps are carried 
out may be varied. 


Elute with GSH 


contaminant proteins (e.g. large differences in M, or hy- 
drophobicity). 


2.10.2 Ion Exchange Chromatography of a Sample 
Protein: Glutathione Transferases 


Glutathione transferases (GSTs) are mainly cytosolic en- 
zymes which serve to protect cells from foreign chemicals 
which have the potential to be genotoxic. They accomplish 
this in two main ways: by catalyzing a conjugation reac- 
tion with the intracellular antioxidant tripeptide, glutathione 
(GSH) and by noncatalytic binding of a wide range of en- 
dogenous and foreign chemicals. This latter property may 
explain why the GSTs are the most abundant cytosolic pro- 
teins in mammalian liver comprising 2-5% of total protein. 
In common with other drug detoxification enzymes, GSTs 
exist in multiple isoenzymes which need to be purified in or- 
der to study their individual properties. This presents a chal- 
lenge since isoenzymes may display considerable structural 
similarities to each other. This is the kind of purification 
problem which frequently presents itself in biochemistry. 
An overview of possible purification protocols for rat liver 
GSTs is given in Figure 2.26. 

One of the principal methods for fractionation of GST 
isoenzymes is ion exchange chromatography (Example 2.7). 
Before application to a cation exchange column, however, 
it is advantageous to pass the extract through an anion ex- 
change column (DEAE-cellulose) since most cytosolic pro- 
teins have anionic pls. Prior to application of extract to ei- 
ther ion exchange resin, it is necessary to desalt by passage 
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Example 2.7 Ion exchange chromatography of glutathione transferase isoenzymes 


At all pH values apart from their isoelectric point (pI) proteins possess either positive or negative net surface charge 
(Figure 2.9). Advantage can be taken of difference in surface charge to achieve separation of otherwise very similar 
proteins by ion exchange chromatography. Depending on their net charge, proteins can exchange with either positive ions 
such as Na* or negative ions such as Cl~ on cation and anion exchange resins, respectively (Figure 2.11). These ions 
are electrostatically attracted to oppositely charged groups on the resin such as CM- and DEAE, respectively (Figure 
2.10; Table 2.2). This experiment is highly dependent on pH since this affects the charge of amino acid side-chains on 
the protein surface. 

Ion exchange chromatography is particularly useful in achieving separation of structurally-related proteins such as 
those forming complex protein families as described in Example 2.3. The GSTs are a good example of such a family 
since they are present in most mammalian tissues in the form of multiple isoenzymes. The bulk of the cytosolic GSTs 
may be separated from other non-GST proteins by passage of an extract through a glutathione (GSH)-agarose affinity 
column as described for GST fusion proteins in Example 2.5. Individual isoenzymes can then be routinely separated by 
ion exchange chromatography. 

The affinity extract was applied to CM-Cellulose at pH 6.7 and the column was developed with a 0-75 mM KCl 
gradient. Open circles denote activity with 1-chloro-2,4-dinitrobenzene (CDNB) while filled circles are activity with an 
alternative substrate, 1,2-dichloro-4-nitrobenzene (DCNB; GSTs differ in their preference for these and other substrates). 
Elution positions of isoenzymes were determined by electrophoretic analysis across peaks of activity and also by the 
presence of activity either with one or both substrates: AA, rGST Al-1; A, rGST M3-3; B, rGST A1-2; C, rGST M3-4. 
KCI gradient was monitored by measurement of conductivity (dashed line). 
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By carrying out this experiment at other pH values, a slightly different elution profile would be obtained. Moreover, 
by varying the salt gradient (e.g. using 0-100 mM KCI) the separation of individual peaks may be optimized. This 
example illustrates an important principle in protein purification. Namely, that proteins which behave identically in one 
chromatographic system such as binding to an affinity resin, may behave very differently in another system (e.g. cation 
exchange). Protein purification protocols exploit these often very small differences in chromatographic behaviour. 

Reproduced from Sheehan, D. and Mantle, T.J. (1984), Evidence for two forms of ligandin (YaYa dimmers of 
glutathione S-transferase) in rat liver and kidney. Biochemical Journal 218, 893-897, by permission of Portland Press. 
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through Sephadex G-25 (i.e. by gel filtration). CM-cellulose 
ion exchange chromatography of cytosolic GST activity is 
shown in Example 2.7. These enzymes have generally basic 
pls, since they bind to CM groups at pH 6.7. They also bind 
to the ion exchange group with different affinities and we 
can fractionate them with a 0-75 mM KCI gradient. If the 
proteins were more strongly bound we might have needed 
to use a higher KCI concentration (perhaps 500 mM). This 
chromatography is not enough completely to purify individ- 
ual isoenzymes since they still contain significant amounts 
of non-GST protein. These may be conveniently removed 
by passage through a GSH-agarose column which retains 
only GSTs in the mixture (i.e. by affinity chromatography). 
Passage of 5mM GSH through the affinity column elutes 
purified GST. This purification may also be analysed by 
SDS PAGE and conveniently managed by use of a purifi- 
cation table, which summarizes the relative effects of each 
step in the protocol on total protein and enzyme activity. 

It should be noted that this is not the only protocol possible 
for GST purification. Possible points of variation include: 
rearranging the order of steps; varying pH of ion exchange 
steps to alter the precise pattern obtained; introducing high 
performance to the protocol (FPLC); taking advantage of 
differences in KS between them by applying GSH as 
a gradient (0-5 mM) rather than as a 5mM wash (affinity 
gradient elution; Section 2.4.5). 


2.10.3 HPLC of Peptides From Glutathione 
Transferases 


Individual isoenzymes of GST are often closely-related 
structurally with up to 99% identity in amino acid se- 
quence between pairs of isoenzymes. It is often required 
to compare newly-discovered isoenzymes by peptide map- 
ping with trypsin. Where a large degree of sequence iden- 
tity exists, this results in a number of common peptides on 
C-18 reversed phase HPLC. This approach allows compar- 
ison within complex protein families (e.g. haemoglobins, 
immunoglobulins). A comparison of the peptide maps gen- 
erated by tryptic digestion of GST subunits 3 and 4 is shown 
in Example 2.3. These subunits possess approximately 80% 
amino acid sequence identity and are capable of dimerizing 
to generate up to three isoenzymes (two homodimers and a 
heterodimer). In humans, a structurally related GST is miss- 
ing from a significant number of the population (the null 
Phenotype) and this may predispose some smokers to more 
rapid development of lung cancer. Reversed phase HPLC 
analysis allowed the visualization of peptides common to 
both subunits and also the identification of subunit-specific 
peptides. Sequencing of these could facilitate further stud- 
ies such as the design of oligonucleotides for cloning of 
genes coding for the different proteins. More detailed com- 
parisons are possible by use of other proteolytic agents or 
by carrying out high resolution analysis of peptide maps us- 


ing techniques such as mass spectrometry (Chapter 4) and 
capillary electrophoresis (Section 5.10). It is now possible 
to interface both of these groups of procedures with HPLC 
peptide separation. 
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Chapter 3 


Spectroscopic Techniques 


Objectives 


e The Electromagnetic spectrum, 


After completing this chapter you should be familiar with: 


¢ The concept of transition between energy levels, 
* The variety of spectroscopic techniques available, 
e Applications of spectroscopic techniques in biochemistry. 


Electromagnetic radiation spans a wide range of energy lev- 
els in a continuum called the electromagnetic spectrum. 
Each part of this spectrum may be characterized by a spe- 
cific range of wavelengths. Living organisms are constantly 
exposed to a wide range of electromagnetic radiation from 
the Sun or other stars. Man-made sources of radiation in- 
clude heated filaments of the type found in ordinary light 
bulbs. We can select for different parts of the electromag- 
netic spectrum using a prism or similar device. Depending 
on its energy content, radiation from different parts of the 
electromagnetic spectrum may interact with biomolecules 
in a wide variety of ways. Electromagnetic radiation there- 
fore provides a useful means to probe the chemical structure 
of biomolecules. In this chapter we will look at the electro- 
magnetic spectrum and describe some of the experimental 
information it is possible to derive from studying interac- 
tions between electromagnetic radiation and biomolecules. 


3.1 THE NATURE OF LIGHT 


3.1.1 A Brief History of the Theories of Light 


Light is composed of electric and magnetic fields, which are 
mutually perpendicular and which radiate out from a source 
in all directions (Figure 3.1). It is therefore a form of electro- 
magnetic radiation. A wave description of electromagnetic 
radiation resulted from the work of Maxwell, Hertz and oth- 
ers in the nineteenth century. In this description, electric 
and magnetic fields are propagated through space as wave 
functions which may be characterized by wavelength, à (the 
distance from one part of the wave to the corresponding 
position on the next wave) and frequency, v (the number of 
times a wave passes through a fixed point in space every 
second). These parameters are related to the energy content 


of the wave, E, by Equations (3.1) and (3.2): 


ja (3.1) 
== 


CPi oy (3.2) 


where h is Planck’s constant and c is the speed of light. These 
equations demonstrate that there is a fixed relationship be- 
tween the energy of a particular type of electromagnetic ra- 
diation on the one hand and its wavelength/frequency on the 
other. High energy radiation is characterized by short wave- 
lengths and high frequency while lower energy radiation is 
characterized by long wavelengths and low frequencies. As 
shown in Figure 3.1, waves also have characteristic ampli- 
tude which is the maximum value the electric or magnetic 
vector can have. When two waves are superimposed, they 
produce a resultant wave which has amplitude equal to the 
sum of that of the individual waves. This is the principle of 
superposition and it is characteristic of all wave functions. 
Electromagnetic radiation from the sun covers a wide 
range of wavelengths in a continuum called the electromag- 
netic spectrum (see below Section 3.2.1). Light visible to 
humans as the colour-range red to violet (A = 720-400 nm) 
represents only a tiny part of this spectrum called the vis- 
ible range. This is because protein receptors in the eye are 
sensitive only to a comparatively small set of wavelength 
values. When white light from the sun impacts on a material 
such as a painting or a flower, some visible wavelengths are 
absorbed. What we experience as colour is simply wave- 
lengths in the visible part of the electromagnetic spectrum 
which happen to be reflected from materials of a particu- 
lar chemical composition. To understand how light interacts 
with matter and especially with biomolecules, we need to 
review the physical nature of light. The development of our 
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Figure 3.1. Light is a form of electromagnetic radiation. (a) When 
a filament is heated, light is emitted or radiated in all directions. (b) 
Light consists of mutually perpendicular electric (E, solid arrows) 
and magnetic (B, dashed arrows) vectors which are wave functions. 
The wave may be characterized by wavelength (A) and amplitude 
(A). 


understanding of light may be traced by describing three 
classic experiments on the interaction of light with matter. 
In the seventeenth century, Sir Isaac Newton demon- 
strated that it was possible to separate light of different 
colours from the Sun’s white light (Figure 3.2). He made a 
tiny perforation in the window-blind of his study in Cam- 
bridge. This selected for a thin beam of white light. When he 
passed this beam through a triangular piece of glass called a 
prism, he could separate white light into a range of colours. 
This phenomenon is now called diffraction and the prism 
in this experiment serves as a monochromator (i.e. a device 
to select for light of single wavelength). Newton’s conclu- 
sion that white light is really a mixture of light of different 
colours was quite controversial in its time since, from an- 
tiquity, it had always been assumed that white light was 
‘pure’ and colours were ‘impure’. In 1801, Thomas Young 
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Figure 3.2. Diffraction of white light. Newton selected for a thin 
beam of white light by making a tiny hole in the blind of his study. 
After passage through a prism, this light was separated into light of 
different colours denoted by arrows. 
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Figure 3.3. Apparatus for the photoelectric effect experiment. 
When light of a particular intensity (/) and frequency v) is shone on 
a metal plate, electrons are released. These cause a current which 
may be detected by an ammeter (A). A retarding (or stopping) volt- 
age (V) may be provided which stops the flow of electrons to the 
cathode. The value of this stopping voltage can be determined with 
light of different J and v values (Figure 3.4). 


demonstrated that diffraction is a consequence of interfer- 
ence and is characteristic of wave functions. It was therefore 
concluded that light is a wave phenomenon. 

However, in the early twentieth century two experiments 
were performed which appeared to contradict a wave de- 
scription of light. The first of these was called the photoelec- 
tric effect experiment. This involved light of a single wave- 
length being shone on a metal plate which formed part of an 
electric circuit (Figure 3.3). The frequency and intensity (i.e. 
amplitude) of this light could be varied. Electrons ejected 
from the metal plate could be detected as a current which 
could be measured in the electrical circuit (hence photoelec- 
tric effect). It had been expected that, when the frequency 
(i.e. energy) of light was continuously increased, release of 
electrons would occur in a continuous fashion. However, 
this was not observed (Figure 3.4). Instead, a threshold fre- 
quency needed to be reached before electrons were ejected. 
Below this threshold, regardless of the intensity of the light, 
no electrons were detected. Above the threshold frequency, 
it was expected that the current measured would increase 
with the intensity of the radiation. Instead, it was found 
that, irrespective of this intensity, the kinetic energy of each 
electron released appeared to be constant for a given metal. 
Increasing intensity of light indeed released more electrons 
but these had the same average kinetic energy. The threshold 
frequency has a characteristic value for each metal, regard- 
less of the intensity of the light. A plausible explanation for 
these findings was that light energy was taken up in minute 
‘packets’ or quanta. This led to the quantum theory of light 
proposed by Max Planck and Albert Einstein. 

The third experiment, which is generally similar to that of 
Newton involved passing an electric current through a tube 
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Figure 3.4. The photoelectric effect experiment. (a) Light of dif- 
ferent intensity (Z; to 73) has the same frequency (v) and wavelength 
(A). (b) Current measured in apparatus shown in Figure 3.3 at a va- 
riety of v values. Regardless of the light intensity, current is only 
detected at values above the threshold frequency vo. (c) Stopping 
voltage increases linearly with v. Different metals give a line of 
identical slope but differing vg values. This plot shows that V is 
proportional to v at values above vo. 


filled with hydrogen gas (Figure 3.5). This generated the 
release of electromagnetic radiation of a much more limited 
range of wavelength values than that of the Sun. The emit- 
ted radiation could be detected on a photographic plate. This 
was called the hydrogen emission experiment and the spec- 
trum of radiation emitted was called the hydrogen emission 
spectrum. A wave description of light would predict that a 
continuum of wavelengths would be found in this emission 
spectrum reflecting a range of possible energy values for the 
hydrogen atom. The emission spectrum actually observed, 
however, consisted of a series of discrete lines suggesting 
that hydrogen had a number of discrete energy-levels. Neils 
Bohr (1913) rationalized this by postulating that the elec- 
tron of the hydrogen atom had quantized energy levels which 
could adopt values of: 


(3.3) 
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where n is the electronic quantum number. Bohr explained 
the hydrogen emission experiment as follows (Figure 3.6). 
Electrical energy stimulates promotion of electrons from a 
ground state (n = 1) to an excited state (e.g. n = 2,3, 4, 
etc.). These electronic energy levels are quantized (i.e. of 
only certain allowed energy values). When the electrical 
stimulus is removed, the excited electron can return to 
the ground state with the emission of light energy. The 
wavelengths of the hydrogen emission spectrum (i.e. the 
individual lines on the photographic plate) correspond to 
transitions between the various electronic energy levels 
(N=2—>N=1;N=3-—- N = 1,etc.) called electronic 
transitions. 


3.1.2 Wave-Particle Duality Theory of Light 


The quantum theory of Planck and Einstein led to the sugges- 
tion that light had a particulate nature and could be regarded 
as consisting of tiny particles called photons. This particu- 
late nature, however, appeared to contradict the diffraction 
of light and the fact that light passes freely through a vac- 
uum which are consistent with a wave description of light. 
The competing wave and particulate theories were unified 
in 1923 by Louis de Broglie in his wave-particle duality 
theory. This postulated that light had both wave and parti- 
cle natures. Moreover, de Broglie postulated that particulate 
matter also had some characteristics of a wave nature. This 
is summarized in de Broglie’s relationship: 


ta (3.4) 
P 

where p is the momentum (mass x velocity) of a moving 
particle such as an electron. We can use this relationship to 
compare the relative magnitude of the wave and particle na- 
tures of a tiny object such as an electron with a macroscopic 
object such as a golf ball (Example 3.1). This comparison 
shows that both objects have both wave and particle natures 
but that the particulate nature dominates the golf ball while 
the wave nature dominates the electron. 


3.2 THE ELECTROMAGNETIC 
SPECTRUM 


3.2.1 The Electromagnetic Spectrum 


We have observed that the human eye may only detect a 
comparatively small part of the electromagnetic spectrum 
(Figure 3.7). In fact, some insects see parts of the spec- 
trum which are invisible to us because the protein recep- 
tors in their eyes can detect a slightly different range of 
wavelengths and this is important in recognition of suitable 
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Figure 3.5. The hydrogen emission experiment. When excited by an electric discharge, hydrogen gas emits light. This may be diffracted 
through a prism and detected on a photographic plate. A series of lines of particular v and A are observed rather than a continuum. A 
completely different spectrum is obtained for helium and other elements. 


flowers by honey bees, for example. Nonvisible parts of the 
electromagnetic spectrum are of relevance to biochemistry 
because they interact with matter, especially biomolecules, 
in particular ways. Important types of nonvisible radia- 
tion include infrared (1.e. below red), ultraviolet (i.e. above 


n=1 


Figure 3.6. Electronic transitions. The hydrogen emission experi- 
ment may be explained by postulating that electrons are promoted 
from their ground state (n = 1) to an excited state (e.g. n = 2-5). 
Return of an electron to the ground state involves an electronic tran- 
sition between quantized energy levels which releases energy (AE) 
corresponding to a light frequency, v, by the relationship AE = hv. 
These beams of light are detected as lines on the photographic plate 
used in the hydrogen emission experiment. 


violet), microwave, radiowave and X-rays. The precise na- 
ture of the interaction of these different types of electro- 
magnetic radiation with matter differs and hence we need 
specific equipment or experimental approaches for each. 
However, they are all part of the same electromagnetic spec- 
trum, differing from each other in terms of their wavelength, 
frequency and energy (Equations (3.1) and (3.2)). 


3.2.2 Transitions in Spectroscopy 


The foregoing allows us to understand that biomolecules 
may be expected to absorb light energy to promote electrons 
from their ground to excited states. In order to return to 
the ground state, it is necessary for the electron to lose 
the extra energy it has acquired. This extra energy has a 
value which may be represented as AF (i.e. the energy 
value of the excited state minus that of the ground state; 
Figure 3.6). Because only certain values are allowed for the 
various energy levels, the Hydrogen emission experiment 
results in a number of distinct frequencies of radiation as a 
consequence of Equation (3.2): 


AE, =h. V1, 
AE, =h. V2, (3.5) 
AE; =h. V3, 


where v1, v2, and so on represent the frequencies resolved on 
the photographic plate. These frequencies occupy the visible 
and ultraviolet parts of the electromagnetic spectrum. 
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Example 3.1 De Broglie’s relationship and the wave-particle duality 


De Broglie’s relationship: 


A= 
P 


Where h is Planck’s constant (6.626 x 10777 erg s) implies that all matter has both a wave and a solid (i.e. particle) nature. 
We can illustrate this by considering an electron and a golf-ball traveling at the same velocity; 100 m/s. The electron 
has a mass of 9.11 x 10778 g while the golf-ball is much larger (10 g). The momentum of each may be calculated by 
multiplying mass by velocity as follows and de Broglie’s relationship can be used to calculate the wavelength associated 
with the wave nature of each: 


Property Electron Golf ball 
Mass (g) 9.11 x 10-78 10 
Velocity (m/s) 100 100 
Momentum (g m/s) 9.11 x 10-6 1000 
Wavelength (cm) 7.27 6.626 x 107? 


From this table, it is clear that the wave properties of the electron are more significant than its particle properties (e.g. 
momentum). Conversely, the particle properties of the golf ball are quantitatively more significant than its wave properties. 
The de Broglie relation means therefore that wave properties are more pronounced at the atomic level while particle 


properties are more significant at the macroscopic level. 
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Figure 3.7. The electromagnetic spectrum. The electromagnetic 
spectrum forms a continuum of wavelengths (A) and frequencies (v). 
The various categories overlap with each other with the exception of 
visible light. Note that visible light represents only a small part of the 
spectrum. Because the energy of each category of electromagnetic 
radiation differs, they interact with matter in distinct ways. 


This phenomenon of transition between energy levels is 
not confined to electrons, however. For example, chemical 
bonds can occupy a variety of vibrational energy levels and 
atoms connected by single covalent bonds may rotate rel- 
ative to each other through a number of rotational energy 
levels (Figure 3.8). These energy levels are also quantized, 
having only a set number of allowed values. Transitions can 
occur between them corresponding to particular frequencies 
as described in Equation (3.5). Because energy increments 
between vibrational energy levels are much smaller than 
those for electronic energy levels, the frequencies of radi- 
ation corresponding to these transitions are in the infrared 
part of the spectrum (i.e. longer à and lower v than the vis- 
ible and ultraviolet). Similarly, energy increments between 
rotational energy levels are smaller even than those between 
vibrational energy levels and correspond to frequencies in 
the microwave part of the spectrum (i.e. longer à and lower 
v than the infrared). 

We will see in this chapter that transition between en- 
ergy levels is a common feature of the interaction between 
biomolecules and electromagnetic radiation. However, due 
to instrumental and experimental limitations, only some of 
the transitions are of use in the study of biomolecules. The 
energy increments (AE) and frequencies associated with 
transitions in the range useful in biochemistry are summa- 
rized in Table 3.1. 
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Figure 3.8. Non-electronic transitions. (a) Vibrational transitions 
involve alternating between a number of allowed bond lengths and 
bond angles, of differing energy. The energy increment between 
vibrational energy levels is denoted by AE. (b) Rotational tran- 
sitions. Atoms bonded together can rotate relative to each other. 
The different rotation states have distinct energies and the energy 
increment between them is denoted by AE. 


3.3 ULTRAVIOLET/VISIBLE 
ABSORPTION SPECTROSCOPY 


3.3.1 Physical Basis 

We have seen that light from the Sun is white but, when it 
impacts on a coloured object, some of this light is absorbed 
and it is the reflected or nonabsorbed light which gives us 
the sensation of colour. If light in the utraviolet/visible part 
of the electromagnetic spectrum is passed through a sam- 
ple in solution, some light energy may also be absorbed 


Table 3.1. Transitions in spectroscopy 


Type of 

transition Part of spectrum AE (J/mole) v (Hz) 
Electronic X-ray 4 x 10° 10'8 
Electronic Ultraviolet/visible 400 x 10° 105 
Vibrational Infrared 40 x 10° 10% 
Rotational Microwave 40 10!! 


Nuclear spin Radiowave 4 x 10-4 108 


(Figure 3.9). Molecules (or parts of molecules) capable of 
absorbing light are called chromophores. This is known as 
an absorption spectroscopy experiment and it is of use be- 
cause the particular frequencies at which light is absorbed 
are affected by both the structure and the environment of 
the chromophore. Light energy is used to promote electrons 
from the ground state to various excited states as shown 
in the Morse diagram in Figure 3.10. Each chemical struc- 
ture absorbs slightly different frequencies of light since each 
has a characteristic electronic structure (i.e. pattern of elec- 
tron distribution). The ground and excited electronic energy 
levels each contain many vibrational energy levels differing 
from each other by smaller energy increments than those be- 
tween the electronic energy levels (AE). Excited electrons 
can return to the ground state by vibrational transitions 
through these smaller energy increments (Figure 3.10). This 
excess energy is lost in collision with solvent molecules. 
The light energy absorbed therefore appears ultimately as 
heat in the solution. 

The absorption phenomenon may be quantified by the 
Beer—Lambert law (Equation (3.6)): 


be ney (3.6) 


where Jp is the intensity of incident light, I is the intensity 
of transmitted light, c is the molar concentration and / is 
the length of the light path (usually 1 cm). e is the molar 
extinction coefficient. The term [log 10/1] is the absorbance 
(A,) at a particular wavelength or frequency. A plot of A, or 
€ versus wavelength or frequency is known as an absorption 
spectrum (Figure 3.11). Wavelengths corresponding to max- 
ima in such spectra are denoted by Amax. Some useful Amax 
and e values are tabulated in Table 3.2. Because each elec- 
tronic energy level consists of several vibrational energy lev- 
els, a range of wavelengths is absorbed rather than one fixed 
wavelength. Under standard conditions, this spectrum is a 
fixed property of a pure chromophore and may therefore be 
used in identification of previously-characterized molecules. 
Moreover, since absorbance is directly dependent on molar 
concentration, it may be used as a measure of concentra- 
tion provided a standard curve for that chromophore is also 
available. This and other uses of the Beer-Lambert Law are 
described in Section 3.3.3. 

Absorbance is an especially appropriate scale for mea- 
suring interactions between light and molecules in solution 
in that, when no light is absorbed, this term has a value of 
0. When light is absorbed, the absorbance value increases. 
In theory, the parameter is linear and can reach a maximum 
of infinity. But, since absorbances of 1, 2 and 3 correspond 
to 10, 1 and 0.1% of light transmitted, respectively, in prac- 
tice it is usually not possible to measure absorbance accu- 
rately above 3.0. Moreover the accuracy of quantification 
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Figure 3.9. Ultraviolet/visible absorption spectroscopy experiment. Light of a single wavelength (A) and intensity (Jo) is passed through a 
sample held in a cuvette. Some of this light may be absorbed by the sample. This is detected as a decreased intensity of transmitted light (I) 


compared to incident light. Log Jo/J is measured as absorbance. 


decreases at high absorbance and the most linearly accurate 
measurements of absorbance are therefore obtained in the 
range 0-1.0. 

Some wavelengths are particularly useful in the study of 
biomolecules (Figure 3.11; Table 3.2). Amino acids have 
a strong absorbance around 210 nm and this is frequently 
used to detect peptides. The aromatic amino acids, tyrosine 
and tryptophan have relatively strong absorbance at 280 nm 
while nucleic acids absorb strongly at 260 nm. These wave- 
lengths are therefore widely used in studies of proteins and 
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Figure 3.10. Physical basis of absorbance. The electrons of a chro- 
mophore may be promoted to a higher energy level as a result of 
absorption of light energy (large arrow). The electron returns to 
the ground state by passing through various energy levels (small ar- 
rows). Vibrational energy levels of the ground and excited electronic 
energy levels overlap in flexible molecules. The small increments 
of energy released as the electron undergoes vibrational transition 
are lost in collisions with solvent molecules and appear as heat. 


nucleic acids, respectively. Reduction of NAD* to NADH + 
Ht causes a major increase in absorbance at 340 nm which 
is taken advantage of in assay of oxidoreductase enzymes. 

The absorbance spectrum for a chromophore under stan- 
dard conditions is only partly determined by its chemical 
structure. The environment of the chromophore also affects 
the precise spectrum obtained. The most important envi- 
ronmental factors affecting absorption spectra are pH, sol- 
vent polarity and orientation effects. These effects are espe- 
cially important in studies of biopolymers such as proteins 
and nucleic acids where chromophores may act as reporter 
molecules which can give information about their immedi- 
ate environment. These effects also mean that it is essential 
to determine absorption spectra under defined conditions of 
pH and solvent composition. 

Protonation/deprotonation effects resulting from pH 
changes or oxidation/reduction affect electron distribution in 
chromophores. This often results in dramatic differences be- 
tween the absorption spectra of the protonated and deproto- 
nated forms of the chromophore (Figure 3.12). Comparison 
of the spectra for protonated and deprotonated tyrosine re- 
veal wavelengths called isosbestic points where absorbance 
of both forms of the chromophore are identical. This can 
be useful if it is desired to quantify the total chromophore 
in solutions of different pH. It is often possible to find a 
wavelength where only one form of the chromophore (pro- 
totonated or deprotonated, reduced or oxidized) has a strong 
absorbance while the other does not. Measurements of ab- 
sorbance at this wavelength across a range of pH may be 
used to measure the pKa of the relevant group. 

Solvent polarity also affects the absorption spectrum 
determined for chromophores (Figure 3.13). Alternative 
solvents to water include aqueous solutions of dimethyl- 
sulphoxide, dioxane, ethylene glycol, glycerol and sucrose. 
Chromophores frequently give a slightly different spectrum 
in such solvents compared to water and this experiment is 
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Figure 3.11. Absorption spectra of some biomolecules. Absorbance (A) was measured at pH 5.0 and is plotted against wavelength (A) for 
some representative biomolecules. (a) 66 UM adenosine monophosphate (AMP). Note the strong absorbance at 260 nm which is characteristic 
of nucleotides. (b) 5 mM phenylalanine (solid line) and 0.7 mM histidine (dashed line). Note the Amax at 257 nm which is characteristic for 
phenylalanine. (c) 70 ug/ml cytochrome C. Note that cytochrome C has a red colour due to the presence of a haem group and has a Àmax at 


410 nm. (d) 65 ug/ml lysozyme. 


known as solvent perturbation. They are particularly useful 
in determining whether or not a chromophore is in contact 
with the solvent (e.g. on the surface of a protein or membrane 
or buried in the interior of the protein/membrane). 
Orientation effects are a spectroscopic consequence 
of the relative geometry of neighbouring chromophore 


a lower absorbance at this wavelength than single stranded 
polynucleotides. For this reason, absorbance measurements 
are useful in monitoring assembly or denaturation of 
nucleotides in vitro. 


OH 
molecules. A good example is the hypochromicity of nucleic I 

acids (Figure 3.14). A solution of free nucleotides has a A 

higher A260 (i.e. absorbance at a wavelength of 260 nm) than 4 qi RT "he 

an identical concentration assembled into a single-stranded pa “points CH—NH, 


polynucleotide. Double stranded nucleic acids, in turn, have 


Table 3.2. Some Amax values useful in biochemistry 


Coo~ PN, CoO 


Chromophore Amax (nm) £ (mM~! - cm™') 250 270 290 310 330 
Tryptophan 280 5.6 ( ke ) 
219 47.0 

Tyrosine 274 1.4 Figure 3.12. Absorbance spectra of tyrosine at pH 6 and 13. Ab- 
Phenylalanine 257 0.2 sorbance spectra of 1mM tyrosine at pH 6 (solid line) and 13 
Adenosine 260 14.9 (dashed line) are shown. The structure of the major form of the 
DNA‘? 260 6.6 chromophore at each pH is indicated. Note the Amax at 295 nm for 
RNA‘ 260 74 the deprotonated tyrosine anion which is missing from the spectrum 
Bovine serum albumin 280 40.9 of tyrosine. Isosbestic points are wavelengths where the absorbance 


“Calculated per mM of 330 Da repeating units. 


of both protonated and deprotonated forms of the chromophore are 
identical. 


à 
(nm) 

Figure 3.13. Solvent perturbation of haemoglobin. Absorption 
spectra are shown for 0.1 mg/ml haemoglobin in the presence 
(dashed line) and absence (solid line) of 20% sucrose. The sol- 
vent of reduced polarity (20% sucrose) enhances absorbance near 
410nm while decreasing absorbance near 220 nm. Note the simi- 
larity of these spectra to those of cytochrome C due to the presence 
of a haem group in both proteins. 


In most cases changes in spectra as a result of environ- 
mental effects are obvious on inspection. It is also possible 
to plot a difference spectrum which may be obtained by 
subtracting one spectrum (A) from the other (B), that is 
Ea — Eg Or Aa — Ag Versus wavelength or frequency 
(Figure 3.15). This facilitates identification of wavelengths 


230 270 310 
À 


(nm) 


Figure 3.14. Hypochromicity of nucleic acids. This is an example 
of an orientation effect in spectroscopy. Spectra in the region of 
260 nm are shown for identical concentrations of free nucleotides 
(solid line), single-stranded DNA (dotted line) and double-stranded 
DNA (dashed line). As the nucleotides are assembled into progres- 
sively more ordered structures, A260 decreases. 
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Figure 3.15. Difference spectroscopy. (a) Absorption spectra are 
shown for 0.2 mM NAD? (solid line) and NADH (dashed line). 
Note the Amax at 340 nm specific for the reduced NADH. (b) Dif- 
ference spectrum, that is plot of AA(AnapH — ANAp+) versus 2. 


or frequencies where differences between the two spectra 
are greatest (Section 3.6.4; Example 3.5). 

Another way in which absorption spectra may be pre- 
sented is as derivative spectra. This involves plotting the 
rate of change of absorbance dA, as a function of i. 
It is particularly useful for high-resolution comparison of 
spectra. 


3.3.2 Equipment Used in Absorption Spectroscopy 


Absorption spectra are measured in a spectrophotometer, 
the basic outline of which is shown in Figure 3.9. Electro- 
magnetic radiation is generated by a lamp which contains a 
metal filament through which an electric current flows. For 
wavelengths in the visible range a tungsten-halogen lamp 
(290-900 nm) is used while deuterium lamps provide both 
ultraviolet and visible radiation (210-370 nm). The light 
emitted from these sources will consist of a wide variety of 
wavelengths. It is necessary to select a single wavelength 
by passing the light through a monochromator of some sort. 
In modern instruments, this is usually a diffraction grating 
which achieves the same effect as the prism used by New- 
ton. This monochromatic light is then used as the incident 
light in the absorption experiment. 

Transmitted light is detected by a photodetector of which 
the most popular are photomultiplier tubes and photodi- 
ode array detectors. Photomultiplier tubes convert light 
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Figure 3.16. Some formats for spectrophotometers. (a) Single-beamed. (b) Split-beamed. Half-mirrors allow some incident light to pass 
through while reflecting the rest. Two beams of light (A and B) are therefore generated from a single source. One of the cuvettes provides a 
blank measurement which is automatically subtracted from that of the cuvette containing sample. Note that, at any given moment in time, 
the detector only measures light from beam A or B as determined by the beam selector. (c) Dual-beamed. Two sources generate individual 


beams. 


intensity into an electric signal by amplifying the signal 
with a cascade of electrons. Photodiode arrays are cheaper 
alternatives to photomultiplier tubes and are composed of 
silicon crystals arranged in a linear array which are sensitive 
to light in the à range 190-820 nm. Absorption of light by 
a photodiode produces a current proportional to the number 
of photons absorbed. Photodiodes allow extremely fast de- 
tection of light intensity. This type of detector is a feature 
of the diode array spectrophotometer in which the detector 
consists of an array of several hundred individual photodi- 
odes. Each individual photodiode is assigned a particular 
wavelength across the range of the instrument. In the case 
of instruments with 1 nm resolution, each photodiode is as- 
signed a wavelength 1 nm apart. In an instrument with 2 nm 
resolution each photodiode is assigned wavelength values 
2nm apart. This detector offers much faster scanning of 
spectra and greater sensitivity than is achievable with pho- 
tomultiplier detectors. 

A variety of design formats are available for spectropho- 
tometers including single beam, split (double) beam and 
dual beam instruments (Figure 3.16). Use of a single, fixed 
wavelength may be appropriate in some applications. This is 
achieved by replacing the monochromator with filters which 
allow a single wavelength of incident light pass through to 
the sample. Examples of this include microtitre plate read- 
ers (which are used to take readings from 96-well microtitre 
plates) and variable wavelength detectors of the type used 
in chromatography (Chapter 2). 

Sample is contained in tubes called cuvettes which may 
be constructed from a number of materials. Glass or plastic 


cuvettes are useful in the visible part of the spectrum but 
have the disadvantage that they may absorb ultraviolet ra- 
diation. Quartz cuvettes do not absorb in the ultraviolet but 
are fragile and expensive. Disposable plastic cuvettes which 
do not absorb at wavelengths greater than 280 nm are now 
commercially available. Plastic cuvettes may be damaged 
by exposure to organic solvents which can limit their use- 
fulness in certain types of experiments. Because the cuvette 
(and/or solvent) may have its own absorption spectrum, it is 
essential to determine a blank spectrum of cuvette plus sol- 
vent and to subtract this from that of solvent containing the 
chromophore to obtain a true spectrum for the chromophore. 
Frequently this is achieved in a dual beam instrument with 
one beam passing through the blank cuvette and the other 
through the analytical sample. 

Modern spectrophotometers are usually connected to a 
computer which facilitates storage, presentation and analy- 
sis of spectroscopic data. 


3.3.3 Applications of Absorption Spectroscopy 


Since many biomolecules possess distinct absorption spec- 
tra, absorption spectroscopy has a wide range of applica- 
tions in biochemistry. Advantage is frequently taken of the 
Beer—Lambert law (Equation (3.6)) to carry out quantita- 
tive measurements. If the value of € is known, it is pos- 
sible to calculate concentrations directly from absorbance 
readings at specific wavelengths. In enzyme assays (Fig- 
ure 3.17) the concentration of either substrate or prod- 
uct may be measured to allow calculation of rates of 
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Figure 3.17. Use of spectroscopy in enzyme asays. (a) Continuous assay of a dehydrogenase producing NADH + H+. The rate (uMoles/min) 
may be calculated from enapu and the AA349 per minute (dashed arrow). (b) Stopped assay of alkaline phosphatase. This enzyme produces 
4-nitrophenol from 4-nitrophenyl phosphate. A sample is taken from the assay at various times (points on graph) and NaOH is added. This 
halts the enzymatic reaction by rapidly increasing the pH and, simultaneously, converts 4-nitrophenol into the 4-nitrophenolate anion which 


has a strong absorbance at 405 nm. The rate is calculated as in (a). 


enzyme-catalyzed reactions. These measurements may be 
continuous (i.e. the reaction is monitored continuously in the 
light-beam of a spectrophotometer) or stopped (i.e. the re- 
action is halted at various time-points and spectroscopic 
measurements are performed separately from the reaction 
producing product or consuming substrate). For example, 
reduction of NAD to NADH + Ht by oxidoreductase en- 
zymes is conveniently followed by continuous measurement 
of A340. NADH has a strong Amax at this wavelength while 
NAD does not (Figure 3.15). Other quantitative applications 
of absorbance measurements include measurement of con- 
centrations of biomolecules with the aid of a standard curve 
(Figure 3.18). 

A second type of application involves structural studies of 
biopolymers such as proteins and DNA. These are complex 
biomacromolecules and their assembly and unfolding are 
of considerable research interest. Chromophores, which are 
part of these molecules, are very sensitive to their immediate 
environment. We can often follow assembly/denaturation 
processes in such molecules, therefore, simply by moni- 
toring absorbance changes at particular wavelengths, since 
these processes will alter the precise environment of the 
chromophore (Figure 3.19). 

Under standard conditions, absorption spectra are charac- 
teristic for specific biomolecules. Such spectra may there- 
fore provide a useful chemical ‘fingerprint’ for compari- 
son of biomolecules (especially of low molecular mass) to 


each other. This makes possible confirmation of identity be- 
tween molecules perhaps purified from different sources. 
Conversely, differences between spectra are good evidence 
for structural difference. 
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Figure 3.18. Determination of protein concentration using a stan- 
dard curve — the Bradford assay. Coomassie brilliant blue has a 
Amax at 596 nm when it complexes with protein. A standard curve is 
constructed with various known concentrations of bovine serum al- 
bumin. The concentration of an unknown sample may be estimated 
by comparison with this curve (arrows). The use of a standard 
curve for the relationship between A, and concentration depends 
on the Beer—Lambert law. It is commonly used for estimations in 
biochemistry. 
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Figure 3.19. Spectroscopic determination of denaturation/refolding in biopolymers. (a) Thermal denaturation of ribonuclease A as de- 
termined by a decrease in A287. Since unfolded protein has a lower A287 than folded, this gives a measure of denaturation. (b) Thermal 
denaturation of Pseudomonas aeruginosa DNA. Denaturation leads to an increase in A260 due to the hypochromicity of DNA. The melting 
temperature (Tm) may be calculated as the half-way point between completely folded and completely unfolded DNA (dashed line). There 
is a direct correlation between Tm and the % G + C content of DNA (inset). 


Chromophores such as aromatic amino acids and nu- 
cleotide bases form part of the structure of biomacro- 
molecules such as proteins and DNA. For this reason, they 
are referred to as intrinsic chromophores. It is also possible 
chemically to attach artificial groups with strong absorption 
spectra to proteins. Such groups are extrinsic chromophores 
and may be used as reporter groups because their spectra will 
be affected by their specific environment. Ideally, reporter 
groups should have a single site of attachment to the target 
macromolecule and should not affect its normal structure 
and function. 

Absorption measurements are frequently the basis of 
detection methodologies in biochemistry. Such measure- 
ments may be made directly or indirectly. The variable 
wavelength detector used in chromatography systems 
mentioned above is a good example of direct spectroscopic 
detection (Section 3.3.2). Measurements at 210, 280 and 
260nm allow convenient on-line detection of peptides, 
proteins and DNA, respectively. Most biomolecules have 
relatively low intrinsic absorbance values, however, which 
limit their sensitivity in detection systems. Moreover, not 
every molecule in a complex mixture will have the same 
value for ¢ at a given wavelength. For these reasons, indirect 
measurements of a particular extrinsic chromophore with 
a strong absorption are widely used in detection systems. 
Examples include chromogenic substrates used in the 
detection of enzymes such as _horseradish-peroxidase 
and (-galactosidase (Table 3.3). These are nonbiolog- 
ical substrates with weak absorption spectra which are 
enzymatically converted into chromophores with strong 
spectra. Conjugation of such enzymes to specific antibodies 
provides the basis for highly sensitive immunodetection in 
commercial diagnostic kits. High sensitivity is conferred 
by signal amplification, that is the production of many 
molecules of product by a single molecule of enzyme. 


3.4 FLUORESCENCE SPECTROSCOPY 


3.4.1 Physical Basis of Fluorescence 
and Related Phenomena 


The Morse diagram shown in Figure 3.10 explains the appar- 
ent disappearance of light energy in absorption spectroscopy 
in terms of appearance of small increments of heat energy 
which are lost in kinetic collisions in solution. Because vi- 
brational energy levels of the ground and excited states of 
most chromophores overlap, they can return to the ground 
state by a series of nonradiative transitions. Some chro- 
mophores are quite rigid and inflexible molecules however, 
and may have a limited range of vibrational energy levels. In 
such molecules, the vibrational energy levels of the excited 
state often do not overlap with those of the ground state (Fig- 
ure 3.20). When chromophores of this type absorb light, it 
is not possible for them to return to the ground state by sim- 
ply losing their excess energy as heat. Instead, they undergo 
a radiative transition in which a portion of the absorbed 
energy is re-emitted as light. This phenomenon is called 
fluorescence and such chromophores are called fluors or flu- 
orophores. Since at least some of the light energy initially 
absorbed is lost in transitions between vibrational energy 
levels, the light energy emitted is always of longer wave- 
length (i.e. lower energy) than that absorbed. This means 
that fluors have a characteristic fluorescence or emission 
spectrum as well as a characteristic absorbance spectrum 
(Figure 3.21). 

The excited state lifetime of a fluor (i.e. the period of 
time spent in the excited state) is usually very short (ranging 
from 0.5 to 8ns). However, in certain cases the period of 
time spent in the excited state may be significantly longer 
ranging up to 2s. The latter situation can arise as a conse- 
quence of a phenomenon associated with electrons called 


Table 3.3. Some chromogenic enzyme substrateas 
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CH3, CH3 
CH3 CH3 
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Product is blue against a white 
background (activity stain) 


CH3 CH3 
PEROXIDASE 
HO) 2H,0 

CH; CH; 


Amax = 450 nm 


Note: These substrates have negligible absorbance at Amax but, when converted into product by the enzymes, a strong absorbance is detected 


at this wavelength. 


magnetic spin which will be described in more detail later 
(Section 3.7). In most atoms, pairs of electrons in a single 
orbital have spins oriented in opposite directions as a result 
of the Pauli exclusion principle. Such atoms are said to exist 
in a singlet state and the electrons are said to be paired. If 
an electron is promoted to a higher energy orbital however, 
then its spin can be oriented either in the same or opposite 
orientation as the electron left in the original orbital (i.e. 


parallel or antiparallel). When placed in a magnetic field, 
this atom can have three distinct energy levels which may 
be designated —1, 0 and +1, that is it now exists in a triplet 
state. Some heavy atoms possessing even numbers of elec- 
trons in their singlet ground state can have two unpaired 
electrons stably occupying different orbitals in a triplet state 
of slightly higher energy than the ground state (e.g. molec- 
ular oxygen). In such atoms, it is possible for a transition to 
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Figure 3.20. Physical basis of fluorescence. If a chromophore has 
a comparatively rigid structure with few vibrational energy levels 
such that the electronic energy levels of the ground and excited 
states do not overlap (cf Figure 3.10), a radiative transition may 
occur (dashed arrow) which is called fluorescence. This results 
in emission of radiation of frequency v, and wavelength A. Such 
molecules are called fluors. Phenomena related to fluorescence in- 
clude phosphorescence and chemiluminescence (see text). 


occur between the singlet ground state (where the electrons 
are antiparallel and paired) to an excited state (where the 
electrons are parallel and unpaired). When this molecule re- 
turns to the ground state a radiative triplet-singlet transition 
occurs which is a much slower process than fluorescence 
called phosphorescence. Radiation emitted as a result of 
this transition is at an even longer wavelength than radia- 
tion emitted during fluorescence because the lowest-energy 
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Figure 3.21. Absorbance and fluorescence spectra. Absorbance 
(solid line) and fluorescence (dashed line) spectra of (a) tryptophan 
and (b) tyrosine. 


triplet state typically has only slightly higher energy than 
the lowest-energy singlet state. 

A third phenomenon resulting in radiative transitions 
called chemiluminescence occurs in molecules which can 
be promoted to an excited state as a result of a chemical 
reaction and which then return to the ground state with the 
emission of light. Molecules such as luciferin and luminol 
(Table 3.4) show this property and, when used as substrates 
by the enzymes firefly luciferase and peroxidase, respec- 
tively, emit light which may be measured or detected (Sec- 
tion 3.4.2). These reactions are highly efficient and may 
be readily incorporated into sensitive assay systems or in 
staining of protein or nucleic acid blots (Figure 3.22). For 
example the level of ATP (or of any biomolecule which 
may be coupled to ATP production) may be quantified as 
follows: 


Luciferin + ATP + O, 
4 (Firefly Luciferase) 
Oxyluciferin + AMP + pyrophosphate + CO, + light 


This assay is extremely sensitive allowing detection at 
the pM/fM level. Since peroxidase-conjugated antibodies 
are extensively used in immunodetection systems, luminol 
in particular has wide applications in this general area (e.g. 
Section 5.10.2). Both phosphorescence and chemilumines- 
cence are related to fluorescence in the sense that all three 
phenomena result in emission of light. 

Fluors and molecules capable of chemiluminescence are 
often found to possess complex ring structures which are 
largely responsible for their rigidity (Table 3.4). Most com- 
ponents of biopolymers are nonfluorescent and this means 
that fluorescence may give information about processes oc- 
curring in specific parts of the biopolymer. For example, 
the most important intrinsic fluors in proteins are tryp- 
tophan, tyrosine and phenylalanine residues (in practice, 
tryptophan and tyrosine give stronger spectra than pheny- 
lalanine). Fluorescence of tyrosine is frequently quenched 
as a result of proton transfer in the excited state (see be- 
low). These residues have average occurrences of only 
3.5, 1.1 and 3.5% in proteins, respectively. Accordingly, 
many proteins with extensive absorbance spectra may have 
only one or two fluorescent residues and advantage can 
be taken of this in the study of biochemical processes in- 
volving the protein. The bases of DNA nucleotides and of 
some enzyme cofactors (e.g. NAD) are also intrinsic fluors 
although they display generally weak fluorescence spec- 
tra. In such cases, extrinsic fluors may be used to carry 
out structural and functional studies. These are nonbio- 
logical molecules which possess fluorescence properties 
(Table 3.4). 
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Table 3.4. Chemical structures of some common fluors and chemiluminescent compounds 


Compound Structure Use 


Tyrosine Intrinsic fluor (proteins) 


Tryptophan Intrinsic fluor (proteins) 


1-Anilino-8-naphthalene sulphonate Extrinsic fluor (proteins) 


(ANS) 


Fluorescein Extrinsic fluor (proteins) 
Ethidium bromide Extrinsic fluor (DNA) 
Acridine orange Extrinsic fluor (DNA) 
Luminol (3-Aminophthalhydrazine) NH, O Chemiluminescent substrate 
(peroxidase) 
NH 
NH 


Luciferin N N COOH Chemiluminescent substrate 
Í 5< T (firefly luciferase) 
HO S 
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Figure 3.22. Use of chemiluminescence in specific staining of protein and nucleic acid blots. Peroxidase may be covalently attached to an 
antibody or oligonucleotide as shown. After transfer of (a) protein or (b) nucleic acids to a suitable membrane (Chapter 5), specific proteins 
or nucleic acids may be visualized by staining with luminol. Light emitted by chemiluminescence may be detected ob photographic film. 


3.4.2 Measurement of Fluorescence 
and Chemiluminescence 


Fluorescence measurements are performed in a spectroflu- 
orimeter, an outline diagram of which is shown in Figure 
3.23. An incident beam of radiation of a given wavelength 
is passed through a sample cuvette containing the fluor. 
Emitted radiation (of longer wavelength than that of the 
incident beam) is detected by a photomultiplier tube. This 
is generally similar to the design of the spectrophotometer 
previously described (Figure 3.9). The main differences are 
that emitted radiation is detected at 90° to the direction of 
the incident light beam and that a second monochromator is 
required to select for the different wavelength of the emitted 
light. Since fluorescence is emitted in all directions from 
the fluor, this design excludes inadvertent detection of the 
incident beam. A consequence of this arrangement is that 
cuvettes used in fluorescence measurements must have four 
clear sides whereas those used in absorbance measurements 
may often have frosted or blackened side-walls. 

Chemiluminescence is measured in a luminometer (Fig- 
ure 3.24). The reaction leading to light emission is carried 
out in a cuvette and the wavelength and intensity of emitted 
light is detected. When part of an in situ detection system 
(e.g. western blotting), the emitted light can be detected on 
photographic film. 


Only a proportion of the light energy originally absorbed 
is emitted as radiation, since some energy may be lost in vi- 
brational transitions as previously described (Figure 3.20). 
Two further processes can diminish or quench the amount 
of light energy emitted from the sample. Internal quench- 
ing is due to some intrinsic structural feature of the excited 
molecule involving, for example, structural rearrangement. 
External quenching arises either from interaction of the ex- 
cited molecule with another molecule present in the sample 
or else absorption of exciting or emitted light by another 
chromophore present in the sample. All forms of quench- 
ing result in nonradiative loss of energy. The amount of 
energy lost due to both vibrational transitions and inter- 
nal quenching will be a property of the particular molecule 
since these processes arise as a result of chemical structure. 
External quenching may be due to contaminants present 
in preparations or, alternatively, may be deliberately intro- 
duced into the experiment. Acrylamide, iodide and ascorbic 
acid are examples of molecules capable of acting as external 
quenchers. 

We can quantify fluorescence by the quantum yield, Q: 


Number of photons emitted 


Q 


= 3.7 
Number of photons absorbed a 
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Figure 3.23. Ultraviolet/visible spectrofluorimeter. Light of a single wavelength (1) is passed through and absorbed by a sample held in 
a cuvette as described in Figure 3.9. At certain values of i, light is absorbed and the molecule fluoresces. Fluorescent light is emitted in all 
directions (dashed lines) and is measured at 90° to the incident beam. Emitted light, which is of longer wavelength (A2) than absorbed light, 


is passed through a second monochromator and detected. 


Under a given set of conditions, Q will usually have a 
fixed value for a particular fluor, with a maximum possible 
value of 1. However, because it is experimentally difficult 
to measure Q accurately, we often use relative measure- 
ments of fluorescence in practice. The possibility of ex- 
ternal quenching has the important practical consequence 


Syringe —— - vee 
hotomultiplicr Amplifier 


man j 


Mixing chamber 


Detector 


Figure 3.24. Outline design of a luminometer. Reactants (e.g. fire- 
fly luciferase and luciferin) are introduced into a mixing chamber 
by a syringe or similar device. When chemiluminescence occurs, 
light is emitted in all directions (dashed lines). A beam of such 
light is selected by a slit and collected by a photomultiplier tube. 
The electrical signal from this is amplified and passed to a detector. 
Note that the reaction chamber is protected from background light 
and is temperature-controlled. 


that great care must be taken in carrying out fluorescence 
measurements to ensure the absence of quenchers from the 
sample and all solutions used. These may therefore require 
prior purification and filtration. 

Fluorescence is usually much more sensitive than ab- 
sorbance spectroscopy since the lower limit of detection of 
a spectrophotometer is determined by the ability of the in- 
strument to distinguish between Jp and 7 (Equation (3.6)). 
By contrast, the lower limit of detection of a spectroflu- 
orimeter is set by the ability of the instrument to detect 
light (i.e. to distinguish J from 0). Use of a more pow- 
erful light-source and a more sensitive detector increases 
the sensitivity of spectrofluorimeters while giving little im- 
provement in sensitivity of spectrophotometers. Moreover, 
most chromophores lack fluorescence properties altogether 
and even large macromolecules such as proteins may con- 
tain a small number of intrinsic fluors. A further advantage 
of fluorescence over absorption spectroscopy is the fact that 
fluorescence spectra are even more sensitive to environ- 
mental effects than absorption spectra. For these reasons 
fluorescence is generally more sensitive and specific than 
absorbance. 


3.4.3 External Quenching of Fluorescence 


External quenching may involve direct contact between the 
excited molecule and external quencher. This is possible 
by two main mechanisms. Dynamic quenching involves 
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Figure 3.25. Mechanisms of fluorescence quenching. (a) Colli- 
sional or dynamic quenching. The quencher collides with the ex- 
cited fluor leading to loss of some energy from the excited state as 
kinetic energy. (b) Static quenching. The quencher and excited fluor 
form a stable complex called a ‘dark’ complex. Some energy from 
the excited state is lost during this process. 


collision between the two molecules with the fluor losing 
energy as kinetic energy. Static quenching involves a more 
long-lasting formation of a complex between the fluor and 
quencher, which is sometimes referred to as a dark complex 
(Figure 3.25). It is often difficult to distinguish these two 
processes experimentally, but the following discussion will 
show how this may be achieved. 

An excited fluor (X*) can lose energy by three general 
processes described by the following relationships: 


1. The fluor can fluoresce (FI): 
ky f 
X x —— X + light (3.8) 


2. The fluor can experience internal quenching (IQ): 


Se x (3.9) 


3. The fluor can experience external quenching (EQ) in 
the presence of a specific concentration of external 


quencher [q]: 


X*+ pe 
q 


(3.10) 

These processes are governed by rate constants kı — ks, 
respectively. The overall fluorescence measured, F, is given 
by the ratio of fluorescence to the sum of these three possi- 
bilities: 


Fl 


F = F+104 50 oe) 


If little internal or external quenching occurs, the quantum 
yield may approximate closely to 1 (although, depending 
on the fluor, it is usually less than 1 due to nonradiative 
vibrational transitions as discussed above). In the presence 
of an external quencher, the value of F will be smaller than 
in the absence of quencher. The rate of each of the three 
processes can be calculated from simple reaction kinetics 
and substituted into Equation (3.11): 

ky [X*] 


F= (3.12) 
ky - [X*] + kp - [X*] + ks - [X*] - [q] 


where square brackets denote concentration. This relation- 
ship can be simplified further by canceling out the common 
term [X*]: 


kı 


ES (3.13) 
ki +k + ks- [q] 


The fluorescence in the absence of q (i.e. [q] = 0), Fo, is 
given by: 


r= (3.14) 
e b+ 
The ratio between Fp and F is therefore: 
Fo ky +k) +k3-[q] 
= 3.15 
F ki +k B15) 
EE i (3.16) 
as oe Sab ; 


The value for (1/k, + k2) is the excited state lifetime 
of the fluor and would be expected to be constant for a 
particular molecule. It can be designated as T. Therefore: 


P itk- [g] T (3.17) 


F 


This is known as the Stern—Volmer relationship and it is 
used widely in analysis of fluorescence quenching data. The 


product of k3 and T is defined as the dynamic Stern—Volmer 
constant, Ksy: 


Ksy =k3-T (3.18) 

A plot of the ratio of fluorescence intensity in the absence 
of external quencher (Fo) to that in the presence of exter- 
nal quencher (F) as a function of concentration of external 
quencher ([q]) is expected to be linear with a slope equal 
to Ksy and an intercept of 1. Such a graph is known as a 
Stern-Volmer plot and an example is shown in Figure 3.26. 
In practice such plots often show deviations from linearity 
(i.e. curving) because of a variety of factors such as hetero- 
geneity in fluor excited state, a mixture of quenching mech- 
anisms and deviation from ideal behaviour. Such situations 
are beyond the scope of the present discussion but, in ana- 
lyzing such processes, modified forms of the Stern—Volmer 
equation are used. 

Where quenching arises as a result of a collisional pro- 
cess, k3 is the dynamic quenching rate constant. However, 
when a stable bimolecular complex is formed leading to 
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Figure 3.26. Experimental distinction of mechanism of quenching. 
(a) Stern—Volmer plot for both static and dynamic quenching is 
often linear. (b) A linear dependence of Ksy on T/n is diagnostic 
of dynamic quenching. (c) A linear dependence of log Ksy on 1/T 
is diagnostic of static quenching. 
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quenching: 
F+q-— Fq (3.19) 
The ratio between Fy and F is given by; 
Fo 
2 = 1+ Koy: fal (3.20) 


where Ksv is the association constant governing the com- 
plex formed in 3.19. Dynamic and static quenching there- 
fore give Stern—Volmer plots of the same mathematical 
form and the two mechanisms cannot be distinguished by 
Stern—Volmer plots alone. In practice, it is necessary to mea- 
sure Fọ/F under a range of conditions of temperature and 
viscosity. 

A major difference between dynamic and static quench- 
ing is the fact that temperature affects the two processes 
in opposite ways as described in more detail below. Dy- 
namic quenching depends on collisions between the ex- 
cited fluor and the quencher and results in a decrease in 
the excited lifetime, T. It is a diffusion-controlled pro- 
cess which increases with temperature. Static quenching, 
on the other hand, results from formation of a complex be- 
tween a portion of the fluor and the quencher which leaves 
the bulk of the fluor unquenched. This process does not 
affect T and, moreover, occurs less efficiently at higher 
temperatures since the fluor-quencher complex is likely 
to be less stable under these conditions. It is to be ex- 
pected, therefore, that static quenching decreases while dy- 
namic quenching increases with temperature and that T is 
unaffected in static quenching but decreases in dynamic 
quenching. 

Solvent viscosity strongly affects dynamic quenching and 
can be used experimentally to distinguish it from static 
quenching. The dynamic quenching rate constant, k3, is re- 
lated to the bimolecular rate constant for collision, ko, by 
the Smoluchowski equation: 

ks =y-ko (3.21) 
where y is the quenching efficiency which has values in the 
range 0-1. ko can be related to physical attributes of the 
fluor/quencher pair by the following relationship: 


p = ETN- X: (Dt Dy) 
m 1000 


(3.22) 


where N is Avogadro’s number, X is the distance of closest 
approach between the fluor and quencher (i.e. the sum of 
their molecular radii) and Dp and D, are the diffusion co- 
efficients of the fluor and quencher, respectively. For dilute 
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solutions, the diffusion coefficient, D, can be approximated 
by the Stokes—Einstein equation as follows: 


k-T 
= ——— (3.23) 
6-1-n-r 

where 7 is the viscosity of the solution, r is the radius of 
the molecule, k is Boltzmann’s constant and T is absolute 
temperature. Thus, for an efficient quencher obeying only a 
dynamic quenching mechanism, it is expected that k3 should 
vary linearly with the ratio T /n. 

By contrast, a quencher obeying only a static quenching 
mechanism gives a slope determined by the association con- 
stant, Ksy, which would be expected to obey the van ‘tHoff 
equation since it is an equilibrium constant: 


d log Ksy _ AH GB 24) 
d(1/T) ; 


~ 2.3-R 


where AZ is the enthalpy change associated with complex 
formation and R is the Universal Gas Constant. 

This means that measurements of fluorescence quenching 
may be made under varying conditions of temperature and 
viscosity. In the case of solely dynamic quenching the slope 
of Stern-Volmer plots is expected to vary linearly with T/7 
while in solely static quenching, its log varies linearly with 
1/T (Figure 3.26). In practice, a particular fluor/quencher 
pair may interact by acombination of these two mechanisms. 
For details of analysis of such complex situations literature 
in the bibliography at the end of this chapter should be 
consulted. 

Fluorescence spectra are strongly influenced by exposure 
to solvent and/or quenchers present in the solvent. This fact 
underlies much of the practical usefulness of fluorescence 
spectroscopy in biochemistry. For example, slight changes 
in the spatial position or solvent exposure of intrinsic protein 
fluors such as tryptophan can result in either an increase or 
a decrease of quantum yield as well as qualitative changes 
in the spectrum such as a shift to longer wavelengths (a 
red shift). Fluorescence measurements may be performed at 
single emission wavelengths or, more informatively, may be 
acquired across the whole spectrum. Moreover, fluorescence 
measurements may be made under steady-state conditions 
(i.e. at a single time) or may be time-resolved (i.e. mea- 
surements are taken at various times along a time-course). 
In order to give an indication of the versatility of this tech- 
nique, we will now describe some of the main applications 
of fluorescence in biochemistry. 


3.4.4 Uses of Fluorescence in Binding Studies 


Binding of ligands to proteins frequently causes changes 
to their three-dimensional structure. Examples of this in- 


I/AF 
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Figure 3.27. Use of intrinsic fluorescence measurements for de- 
termination of ligand Kp in binding proteins. Changes in intrinsic 
fluorescence (AF) as a result of ligand binding to a protein are 
determined. Plots of 1/AF versus 1/[Ligand] are linear. The Kp 
may be determined from the intercept of this line on the abscissa 
(Example 3.2). 


clude the binding of substrates, inhibitors, cofactors or al- 
losteric modulators to enzymes or of hormones to recep- 
tors. If this structural change has an effect on the envi- 
ronment of an intrinsic or extrinsic fluor in the protein, 
this can result in measurable changes in the fluorescence 
spectrum. Provided that the fluor has a unique location in 
the protein, such changes of fluorescence at a particular 
wavelength (AF) can be used to determine the dissocia- 
tion constant (Kp) of the protein for the ligand where Kp 
is a measure of the affinity of the protein for the ligand 
(Figure 3.27): 


P+L2PL 
_ [P]; [L] 
oS pr Hen 


Some extrinsic fluors such as 1-anilino 8-naphthalene 
sulphonate (ANS; Table 3.4) are only strongly fluorescent 
when bound to proteins. ANS is attracted to and will bind at 
slightly hydrophobic pockets on the protein surface. Compe- 
tition experiments can be carried out in which this fluor com- 
petes with other ligands (Figure 3.28). Fluorescence mea- 
surements of both intrinsic and extrinsic fluors can therefore 
be used to estimate Kp values for ligands in proteins (Ex- 
ample 3.2). This technique complements other methods of 
Kp determination such as equilibrium dialysis (Chapter 7; 
Section 7.2.1). 


(a) 200 
100 \ [Ligand] 
50 \ (uM) 
0 
I/AF 
1/ANS 
(M7!) 
(b) 200 
[Ligand] 
(uM) 
100 
IAF 50 
0 
I/ANS 
(uM) 


Figure 3.28. Competition between ANS and ligand for binding 
site on protein. Change in fluorescence (AF) is measured in the 
presence of various concentrations of ANS and ligand. (a) ANS 
and ligand bind at different sites. (b) ANS and ligand compete for 
the same site (lines intersect on ordinate). In practice, other patterns 
of intersecting lines may be observed but the pattern observed in 
(b) is diagnostic for competitive binding. 


3.4.5 Protein Folding Studies 


Fluorescence is one of the main spectroscopic techniques 
used in the study of protein folding (for a more detailed dis- 
cussion of this topic see Chapter 6). Folded proteins are very 
compact structures in which individual residue side-chains 
adopt highly-defined spatial locations. When a protein un- 
folds from the native (N) to the unfolded (U) state, these 
side-chains become free to rotate around C, and Cg -bonds 
and move freely in solution. Tryptophan (the main intrinsic 
fluor of proteins) is normally buried in the interior hydropho- 
bic core of the protein. When the protein unfolds, tryptophan 
comes into contact with solvent water and may also be able 
to interact with quenching agents. 

The effects of unfolding on intrinsic fluorescence are in- 
dividual for each protein but four main experimental param- 
eters are usually observed; (1) The fluorescence intensity 
may increase or decrease as a result of unfolding, thus 
causing an increase or a decrease in Q. (2) The fluorescence 
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lifetime, T, may decrease as a result of dynamic quenching 
(e.g. encounters with disulfides or amines) or be unchanged 
as a result of static quenching. (3) The radiation emitted 
from an unfolded protein is usually of lower energy and 
therefore of longer wavelength (i.e. the spectrum is shifted 
to the red end of the spectrum). (4) The fluor is less con- 
strained in terms of rotational movement around C, and 
Cg-bonds in the unfolded state than in the folded state. 
This means that interactions between the fluor and plane- 
polarized light (Section 3.5) are different in the folded and 
unfolded versions of the protein. A parameter called polar- 
ization is a good measure of this conformational mobility of 
tryptophan and is /ower in unfolded proteins than in folded 
proteins. 

A further measure of solvent-exposure of tryptophan is 
quenching of fluorescence by an external quencher such as 
potassium iodide. This agent, which may be dissolved in 
solvent water, has a strong negative charge which precludes 
it from diffusing into the hydrophobic interior of folded or 
partially-folded proteins. Quenching of fluorescence by KI 
is therefore good evidence for exposure of tryptophan to 
solvent during protein unfolding. 

Fluorescence is not the only experimental measure of 
protein folding since absorbance spectroscopy and circular 
dichroism (CD; Section 3.5.3) are also widely used. If pro- 
tein folding approximates closely to a two-state model (i.e. 
N —> U), these independent methods should give coincident 
estimates of unfolding. However, differences in estimates 
between the different techniques may be important experi- 
mental evidence for deviation from a two-state model and 
for the existence of other species such as folding intermedi- 
ates (Chapter 6). 


3.4.6 Resonance Energy Transfer 


Since a fluor has characteristic Amax values in both its ab- 
sorbance and emission spectra, it is possible to establish 
an experiment in which the emission Aa, of one fluor (A) 
overlaps with the absorbance Amax of a second fluor (B). 
If these separate fluors (extrinsic or intrinsic) have unique 
locations in a protein or macromolecular complex, it is pos- 
sible for emission light energy from fluor A to be absorbed 
by fluor B and to be emitted as part of B’s emission spec- 
trum (Figure 3.29). This phenomenon is called resonance 
energy transfer and, since it is strongly dependent on the 
distance, R, between the fluors, it may be used to measure 
distances in proteins, membranes and macromolecular as- 
semblies especially in the range 10-80 A. The efficiency of 
the transfer process, E, may be expressed in a number of 
ways as follows: 


kr 


Panar (3.26) 
kp thy +H 


74 PHYSICAL BIOCHEMISTRY 


Example 3.2 Use of fluorescence in binding studies 


12547 


0 


-0.25 


-1/Kp 


The glutathione transferases (GSTs) have extensive binding properties which may help to explain why they are the most 
abundant cytosolic proteins in rat and human liver, comprising some 3-5% of total protein. They are able to bind foreign 
compounds such as drugs and dyes as well as a range of naturally-occurring, generally hydrophobic ligands such as 
cholic acid and bile salts. The most abundant GSTs possess a small number (2-4) of tryptophan residues which act as 
intrinsic fluors. Binding of ligands can be conveniently followed by measuring fluorescence quenching. This is because 
ligand binding causes a slight change in the location of tryptophan. If the change in fluorescence intensity (A F) is plotted 
against the ligand concentration as a double-reciprocal plot, the dissociation constant (Kp) of the protein for the ligand 
can be conveniently determined. The following graph shows a typical plot for binding of bilirubin to GST 1-2. 
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where kr, kp and k’ are, respectively, the rates of transfer of 
excitation energy, of fluorescence and the sum of all other 
de-excitation processes such as vibrational transitions. 


= i (3.27) 
~ RE + R® 
where R is the distance between the donor and acceptor 
molecules and Ro is a constant related to the donor-acceptor 
pair which can be calculated from their absorption and emis- 
sion spectra. In practice, E can be determined either from 
the fluorescence intensity (F) or the excited state lifetime 
(T) of the donor determined in the presence (da) and absence 
(d) of the acceptor as follows: 


Faa 

E=1-— (3.28) 
Fa 
1— Taa 

E= (3.29) 


Once the value of E has been determined, R can be cal- 
culated from Equation (3.27) once Ro is known. Measure- 
ments of distances determined by this method agree well 
with those from independent structural determinations made 
using techniques such as X-ray diffraction in crystals and 
NMR (Chapter 6). 

An advantage of resonance energy transfer is that the 
technique can be used to probe the topology of membrane- 
bound proteins which are not amenable to study by crys- 
tallographic methods and which are difficult to study by 
NMR. An example is the P-glycoprotein which is respon- 
sible for the multiple drug resistance (MDR) phenotype, a 
major problem in cancer chemotherapy. The technique is 
also useful in high-sensitivity detection of changes in struc- 
ture of biomacromolecules and macromolecular complexes 
since such changes result in altered values for R. Recent 
examples of such experiments include studies of conforma- 
tional change in contractile proteins (e.g. actin and myosin) 
on ligand binding, mapping of the active site of enzymes 
(e.g. serine proteases), monitoring duplex and tetraplex 
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Figure 3.29. Resonance energy transfer in fluorescence. (a) Resonance energy transfer involves the transfer of light energy from one 
chromophore to another. The efficiency of this process is proportional to 1/ RÉ, where R is the distance between chromophores. Light from 
chromophore A (solid arrows) is absorbed by chromophore B. (b) If the emission Amax of fluor A overlaps with the absorbance Amax of 
fluor B (e.g. at Az), this energy can be absorbed ad re-emitted by the latter (dashed arrows in (a)) at 43 allowing R to be calculated from 


Equation (3.27). 


formation in DNA and measurement of conformational flex- 
ibility of chromosomes. 


3.4.7 Applications of Fluorescence in Cell Biology 


Because of the high sensitivity of fluorescence, certain fluors 
have found a wide range of applications in microscopy. 
Their use frequently results in a particularly low level of 
background staining, that is a low signal : noise ratio. When 
fluors are used in this way, this experiment is called flu- 
orescence microscopy. It allows the detection and visual- 
ization of specific cell components such as chromosomes, 
organelles, cytoskeleton and membrane-bound proteins ex- 
pressed on the surface of individual cell types (e.g. recep- 
tors). This information is particularly useful in identifying 
the location in or on the cell of particular proteins or nucleic 
acids or macromolecular complexes containing them. 
Acridine orange is a fluor which has similar molec- 
ular dimensions to a DNA base-pair. It therefore inter- 
calates readily into double-stranded DNA and is now 
known to be a potent mutagen. At low ratios of acridine 
orange:polynucleotide, intercalation into double-stranded 
DNA results in a large increase in fluorescence with no 
effect on emission max. In the presence of single-stranded 


polynucleotides such as RNA, however, there is significantly 
less fluorescence and a shift in Amax towards the red end of 
the spectrum. If cells are stained with a high concentration of 
acridine orange they therefore emit green light where there 
is an abundance of double-stranded DNA (e.g. the nucleus) 
while they emit red light where there is an abundance of 
single-stranded RNA (e.g. the cytoplasm). It is possible to 
distinguish the nucleus of any cell using this simple experi- 
ment. 

More specific visualization of cell structure is possible 
with immunostaining. In this technique, cells are prepared 
for microscopy (often by freezing or fixing) followed by 
staining with an antibody coupled to a fluor. This is called 
direct labeling. More commonly, an antibody which is not 
fluorescently labeled (the primary antibody) is used which is 
then detected by a secondary antibody which is fluorescently 
labelled (Figure 3.30). The secondary antibody is specific 
for the primary antibody. This is called indirect labelling 
and it has the advantage that a single secondary antibody 
(e.g. antigoat IgG) may be used to detect a wide range of 
primary antibodies (i.e. all goat IgGs). These secondary 
antibodies are widely available commercially. Moreover, 
because several secondary antibody molecules bind to a 
single primary antibody, indirect labelling is more sensitive 
than direct labeling. In essence this approach is similar to 
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Figure 3.30. Immunofluorescence microscopy. (a) Direct immunostaining. (b) Indirect immunostaining. Secondary antibodies are raised to 
constant parts of immunoglobulins from a particular class and species. Note the amplification of signal possible with indirect immunostaining. 


the ELISA assay and to dot blotting techniques described 
elsewhere (Chapter 5). 

An example of this procedure is the use of fluorescein 
isothiocyanate (FITC) covalently conjugated to an antibody 
raised to a specific antigen (e.g. a structural component of 
the cytoskeleton). When a cell is stained with FITC-labelled 
antibody, only the target protein will be labelled. Light emit- 


ted from the stained sample passes through a filter which 
only selects for light of the emission Amax. In this way we can 
‘see’ the cytoskeleton in the cell. A second antibody with 
a different specificity can be fluorescently labelled in the 
same way and used to stain the same preparation. The two 
antibodies are usually labelled with distinct fluors possess- 
ing different emission max values. By changing the filter 
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Figure 3.31. Jn situ hybridization with fluorescently labelled 
oligonucleotide. A fluor such as fluorescein is incorporated into an 
oligonucleotide of defined sequence. The labelled oligonucleotide 
recognizes a complementary sequence in the target nucleic acid. 
(DNA or RNA) and hybridizes to it. Fluorescence allows visual- 
ization of this hybrid in a microscope. This technique allows us to 
determine the location of specific nucleic acids in cell and tissue 
samples. 


appropriately, it is possible to take photographs of a sin- 
gle sample which will reveal the subcellular location and 
distribution of the two target molecules. This allows us to 
identify proteins or other structures which form part of a 
single complex in or on the cell. 

An alternative strategy for specific labelling of sites in 
the cell is offered by the technique of in situ hybridization. 
This is sometimes referred to as the FISH technique (fluores- 
cence in situ hybridization). Short DNA sequences, oligonu- 
cleotides, can be used to stain cell preparations (Figure 3.31). 
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An oligonucleotide can be designed and synthesized to hy- 
bridize specifically to a particular piece of DNA or RNA (e.g. 
part of a gene, a ribosome or a virus) according to the rules 
of Watson-Crick base pairing. When coupled with the use 
of in situ polymerase chain reaction (which allows amplifi- 
cation of a target sequence), this technique gives a sensitive 
and specific means of locating DNA or RNA in cells. If 
pyrimidine nucleotides labelled with fluorescein are incor- 
porated into the oligonucleotide, double-stranded hybrids 
with the target sequence will be visible. Non-fluorescent 
oligonucleotide labels such as digoxigenin and biotin can be 
used in the same way but these must be detected with the 
aid of fluorescently-labelled antibodies as described above. 
This technique has been used in a wide range of studies in 
biochemistry including elucidation of chromatin structure, 
the detection of viruses and in positional cloning (i.e. the 
location of defective genes in chromosomes by phenotype 
mapping without prior knowledge of the functional basis 
of disease). Jn situ hybridization can also be used in com- 
bination with immunostaining to demonstrate, for example, 
colocalization of mRNA and translated protein or to identify 
the location of viruses in viral infections. 


3.5 SPECTROSCOPIC TECHNIQUES 
USING PLANE-POLARIZED LIGHT 


3.5.1 Polarized Light 


Electromagnetic radiation consists of mutually- 
perpendicular electric and magnetic vectors (Figure 
3.1). In a beam of radiation originating from the sun or 
any other light-source, the electric vectors are randomly 
orientated around the beam axis as shown in Figure 3.32. 
If, however, the electric vectors in one plane could be 
selected a beam of plane polarized or linearly polarized 
light would result. It is possible to achieve this by passing 
unpolarized light through a filter called a polaroid. Plane 
polarized light may be thought of as being composed of 
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Figure 3.32. Plane polarized light. The electric vectors of a beam of light are shown as arrows. In nonpolarized light, these vectors are 
orientated randomly through 360°. A polaroid filter selects for a single orientation of electric vectors resulting in a plane of polarization. 
This is plane-polarized light which is also known as linearly polarized light. 
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Figure 3.33. Circularly polarized light. (a) The electric vector of plane polarized light (solid arrows) is composed of two equal and opposite 
circularly polarized components (dashed arrows). (b) The electric vector of circularly polarized light describes an elliptical path along the 


direction of propogation. 


two circularly polarized vectors of opposite direction but of 
equal strength (Figure 3.33). These are called, respectively, 
left and right circularly polarized components of plane 
polarized light. As described below (Section 3.5.4), we can 
select for left or right circularly polarized light. If we traced 
the path of the electric vector along the axis of a beam of 
circularly polarized light, it would retain equal magnitude 
but describe a helical path (Figure 3.33). 


3.5.2 Chirality in Biomolecules 


Carbon has four valencies and, when bound to four distinct 
chemical groups, has two possible isomeric forms which 
may be referred to as D and L (Figure 3.34). Pairs of isomers 
which are nonsuperimposable mirror images of each other 
are called enantiomers or optical isomers. The a—C of the 
amino acid alanine is called an optical centre or a centre of 
asymmetry to denote the fact that the two isomers arise from 
different arrangements around this atom. In some molecules 
there may be more than one optical centre resulting in four, 
six or more possible stereoisomers (Figure 3.34). Some of 
these structural variants are obviously stereoisomers but are 
not related to each other as enantiomers (i.e. nonsuperim- 
posable mirror images of each other). These are referred to 
as diastereoisomers. 

The existence of enantiomeric pairs is known as chirality 
and molecules capable of existing as enantiomers are called 
chiral molecules. ‘Chiral’ derives from the Greek word for 
hand (cheir) because hands are good examples of nonsuper- 
imposable mirror image objects. L- and D-enantiomers are 
conventionally assigned by comparison with the reference 
compound L-glyceraldehyde. An alternative notation called 
the RS convention gives the absolute configuration of enan- 


tiomers and more information on this is available in the 
bibliography at the end of this chapter. 

The phenomenon of chirality is especially important in 
biochemistry because many biomolecules are chiral. In bi- 
ological systems there is usually a selection for one of a 
pair of enantiomers. For example most amino acids in living 
systems are L-enantiomers with D-enantiomers occurring 
only very rarely (e.g. in antibiotic peptides). Similarly, most 
monosaccharides are D-enantiomers with L-enantiomers oc- 
curring rarely. By contrast, most chemical reactions car- 
ried out in solution result in a 50:50 mixture of D- and 
L-enantiomers. 

Chirality has several important consequences for the 
structure, shape and functional properties of biomolecules. 
For example, L-amino acids result in the right-handed he- 
lices observed in proteins. D-amino acids result in left- 
handed helices which are themselves mirror images of right- 
handed helices while heteropolymers of a mixture of D- 
and L-amino acids cannot form helices at all. Proteases 
are capable of hydrolyzing peptide bonds in polypeptides 
composed of all-L-amino acid substrates but cannot hy- 
drolyze those of all-p composition. Conversely, synthetic 
proteases which can be made chemically with all-p-amino 
acids (e.g. the HIV protease) can cleave all-D polypep- 
tide substrates but not those of an all-L amino acid com- 
position. Indeed, the synthetic HIV protease is itself a 
mirror-image of the natural all-L enzyme showing that chi- 
rality is maintained throughout the hierarchy of protein 
structure. 

It has also been discovered that a pair of enantiomers may 
display distinct toxicity to humans with one enantiomer be- 
ing toxic while the other is nontoxic. A well-known example 
of this is the fertility drug thalidomide which was widely 
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Figure 3.34. Chirality in amino acids. (a) Glycine is an achiral molecule because the four substituents to the œ-C are not all chemically 
different. All the 19 other amino acids commonly found in proteins are chiral. (b) Alanine is a chiral molecule because it has a centre of 
asymmetry (*). In nature, only the L-enantiomer is commonly found. (c) Threonine has two centres of asymmetry and accordingly has four 
possible enantiomers, only one of which (arrow) is commonly found in nature. 


prescribed in the late 1950s. One enantiomer of this com- 
pound was nontoxic while the other was teratogenic leading 
to deformity or lack of limbs in babies born to patients 
treated with the drug. For this reason there is considerable 
interest in the pharmaceutical industry in the possible use of 
enzymes for enantioselective synthesis of new drugs. 


3.5.3 Circular Dichroism (CD) 


In the early nineteenth century the French physicist Jean 
Baptiste Biot observed that solutions of some organic 
molecules appeared to rotate the plane of polarization of 
plane polarized light, a phenomenon referred to as opti- 
cal rotatory dispersion (ORD; Figure 3.35). We now know 
that this is a consequence of the fact that each enantiomer 
of an optically active molecule interacts differently with 
left and right circularly polarized light. By convention, 
laevorotation is designated as (—) while dextrorotation is 


denoted by (+). A 50:50 mixture of equal amounts of D- 
and L-enantiomers does not rotate the plane of polariza- 
tion nor does a solution of an achiral molecule such as 
glycine. 

Light passing through a chromophore solution may in- 
teract with the sample in two main ways. The light may be 
refracted or delayed on passage through the solution or it 
may be absorbed. Refraction is quantified by the refractive 
index, n, of the solution while absorption is quantified by the 
molar extinction coefficient, £. If the light is plane polarized 
and the sample is optically active, each enantiomer may in- 
teract differently with the left and right circularly polarized 
components of the light beam. 

ORD arises from the fact that there is a specific refractive 
index for left (n) and right (ng) circularly polarized light: 


nu # tp (3.30) 
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Figure 3.35. Optical Rotatory Dispersion (ORD). When a beam 
of plane polarized light is passed through a sample of a single 
enantiomer (i.e. a chiral sample), the plane of polarization is rotated 
through an angle, 0. (a) L-enantiomers rotate the plane to the left. (b) 
D-enantiomers rotate the plane to the right. Equal concentrations of 
an enantiomer rotate the plane of polarization through equal (though 
opposite) angles, 0, so that a 50 : 50 mixture does not rotate the plane 
at all. 


The difference in refractive index at any wavelength may 
be expressed as An. An ORD spectrum is a plot of An 
against wavelength (A). 

Similarly, optically active samples have distinct molar 
extinction coefficients for left (e) and right (eg) circularly 
polarized light. This is called circular dichroism (CD): 


EL Æ ER (3.31) 


The difference between £p and ¢gp may be expressed 
as Ae. Combining Equations (3.31) and (3.6) (the 
Beer—Lambert law) means that there is a difference in the 
absorbance of left and right circularly polarized light (A A): 


AA=Ae-c-l (3.32) 


If Ae or AA or ellipticity (see below) is plotted against 
wavelength (A), a CD spectrum is obtained. The CD spec- 
trum of one enantiomer is a mirror image of that of the other 
and is related to the corresponding ORD spectrum (and vice 
versa) by a mathematical transformation called the general 
Kronig—Kramers transformation. Both ORD and CD spec- 
tra are evidence for optical activity in the sample and both 
reflect structure of molecules in the sample, especially of 
chiral biopolymers such as proteins and nucleic acids (Fig- 
ure 3.36). In practice, ORD has now largely been superseded 
by CD spectroscopy. 
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Figure 3.36. Circular dichroism (CD) spectra of biopolymers. Sec- 
ondary structure strongly affects CD spectra of biopolymers. (a) 
Standard spectra of a-helix (—), B-sheet (—), random coil (- - -) 
and #-turns (— - -) of polypeptides. (b) Spectra of double-stranded 
DNA of varying G + C content. 26% (—), 72% (—) and 100% 
(=). Modification of Figures 2.5 and 2.9 in Roger and Norden 
(1997) Circular Dichroism and Linear Dichroism, by permission 
of Oxford University Press. 


3.5.4 Equipment Used in CD 


CD spectra are measured in a special type of spectropho- 
tometer called a CD spectropolarimeter of which an outline 
design is shown in Figure 3.37. Since CD depends on differ- 
ential absorbance, a means of selectively exposing sample to 
left and right circularly polarized light is necessary. This is 
achieved by passing a beam of plane polarized light through 
a photoelastic modulator which is normally a quartz crystal 
subjected to an oscillating electric field. The effect of this is 
to vary the circular polarization of the beam passing through 
the modulator alternately from left to right with a frequency 
of some 50 kHz whilst maintaining a constant light intensity. 

Differential absorption of left and right circularly polar- 
ized light is detected at a photomultiplier and converted 
into ellipticity, 0, which has units of millidegrees. This term 
arises from the fact that selective absorption of one of the 
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Figure 3.37. CD spectropolarimeter. A photoelastic modulator selects at any time for either left or right circularly polarized components 
of plane polarized light. It alternates between these at a frequency of 50 Hz. Selective absorption of either left or right circularly polarized 
light is detected at the photomultiplier and gives a CD spectrum for the sample. Samples which are achiral or composed of 50:50 mixtures 
of enantiomers would give no detectable spectrum in this instrument (i.e. AA = 0 at all à). 


circularly polarized components of plane polarized light has 
the consequence that the resultant electric vector traces an 
elliptical path around the axis of the beam and is said to 
be elliptically polarized (Figure 3.38). In a CD spectropo- 
larimeter, the two light beams are not in fact recombined 
but a photomultiplier detector converts incident light inten- 
sity into an electric current composed partly of alternating 
current (AC) and partly of direct current (DC) components. 
The DC component is related to total light absorption by the 
sample while the AC component is a direct measure of CD. 
This arrangement facilitates separate absorption measure- 
ments of the right and left circularly polarized components 
of plane polarized light. 
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Figure 3.38. Ellipticity. A beam of plane polarized light consists of 
electric vectors of equal amplitude. When recombined, the resultant 
vector from the left (L) or right (R) circularly polarized components 
of plane polarized light would describe a circle (incident beam). If 
this light is passed through a chiral sample, one circularly polar- 
ized component is selectively absorbed. This lowers the amplitude 
associated with that component (R in the sample above). When re- 
combined, the resultant electric vector now traces an elliptical path. 
The degree of ellipticity is a measure of circular dichroism. 


Ellipticity may be converted to units of absorbance using 
the following equation: 


(3.33) 


3.5.5 CD of Biopolymers 


The synthetic homopolymer poly-L-lysine adopts a random 
coil structure at neutral and acid pH values. However, at 
high pH values, a mainly œ-helical conformation is adopted 
which may be converted to predominantly antiparallel B- 
sheet by gentle heating. Each of these three forms of poly- 
L-lysine gives a characteristic CD spectrum in the range 
190-240 nm and the fact that similar spectra are obtained for 
homopolymers of other amino acids suggests that they arise 
predominantly from asymmetry of the polypeptide back- 
bone. Measurement of CD spectra in proteins and peptides 
of unknown secondary structure is often used (by analogy 
with such homopolypeptides) to estimate empirically their 
percentages of common secondary structure features. Be- 
cause of the fact that they depend on comparison with sim- 
ple homopolymers and exclude effects due to amino acid 
side-chains, such estimates are not as directly meaningful 
as those arising from more direct structural methods such 
as X-ray diffraction and two-dimensional NMR (Chapter 
6). However, CD spectra are obtainable with a wide range 
of proteins and peptides at relatively low sample concen- 
trations in aqueous solution and thus avoid some of the 
limitations of these methods. Moreover, methodologies for 
analyzing spectra in terms of the main secondary structure 
conformations are now very reliable. 

In practice, CD spectra provide a generally useful in- 
dex of structure in proteins (Example 3.3). Changes in CD 
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Example 3.3 Circular Dichroism studies of structural stability of tetrameric p53 


p53 is a tumour suppressor protein which is the most commonly mutated protein in human tumours. It is therefore 
of considerable interest to an understanding of why some cells develop into cancer cells leading to the formation of 
tumours. These mutations make this protein dysfunctional in over 50% of human tumour samples from which it has been 
sequenced. A functionally important part of the structure of p53 is the tetramerization domain which includes residues 
325-355. In intact p53, this domain is responsible for forming a tetramer between four individual polypeptides. Even 
when expressed as a short peptide, this tetramerization activity is conserved resulting in the smallest known protein 
tetramer. 

In this example, CD was used together with fluorescence (Section 3.4) and differential scanning calorimetry (DSC; 
Chapter 8) in a study of the structural stability of this tetramerization domain. The following plot shows how ellipticity 
of a small concentration (2.5 uM) of the protein varies with temperature. The intersection of all the traces at a single 
point called the isodichroic point is characteristic for a two-state system such as a protein only existing in either a folded 
or unfolded state (Section 3.4.5). As the temperature is increased from 21 to 81 °C, the ellipticity measured in the region 
of 220 nm decreases closer to 0. 

Such measurements allow calculation of the fraction of the protein unfolded. This allowed determination of unfolding 
curves at various concentrations of protein in the range 0.5—20 mM as shown in the following plot. 
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These data were combined with DSC measurements to confirm that this model system unfolds according to a two-state 
model from native (N) to unfolded (U) states; 


NZU 
This is consistent with a structural model derived from two-dimensional NMR (Chapter 6) which indicates that hydropho- 
bic interactions along the intersubunit interface is responsible for the bulk of stabilization of the tetrameric structure in 


this important protein. 


Source: (From Johnson et al., 1995) Reprinted with permission from Johnson, C.R. et al., Biochemistry 34, 5309-5316 © 1995 American Chemical 
Society. 


spectra may therefore be taken to reflect perturbation of 
structure. For example, binding of a ligand to a protein 
or protein folding/unfolding can be conveniently followed 
by CD spectroscopy. A particular advantage of CD in this 
regard is the short time-scale of the CD measurement com- 
pared to that of other spectroscopic measurements. Since 
absorption of a UV photon takes some 107!5 s (compared 
to 107° s for the radiowave radiation used in NMR; Section 
3.7), CD measurements may be expected to detect ligand- 
protein effects of shorter duration than those detectable by 
NMR. Even though many ligands may be intrinsically achi- 
ral, their unique orientation on binding to a protein coupled 
with their specific interaction with the protein can cause 
them to acquire CD properties for the duration of the interac- 
tion. This phenomenon is known as induced CD. Examples 
of this phenomenon include CD spectra induced on bind- 
ing of bilirubin to bovine serum albumin and of dicumarol 
derivatives to a-1 acid glycoprotein. 

The second major class of biopolymers which has been 
studied by CD is the nucleic acids. Of the nucleotide’s struc- 
tural components, only the pentose sugar is chiral. Mainly 
as a result of the presence of this sugar, nucleotides are 
intrinsically chiral structures and give measurable, albeit 
weak, CD spectra. As the level of structural order increases, 
however (e.g. polymerization of nucleotides into polynu- 
cleotides followed by assembly into duplex structures such 
as double-stranded DNA and tRNA), the asymmetry of the 
system and hence the strength of the CD spectrum obtained 
also increases. CD is therefore a useful measure of structure 
in nucleic acids. Differences are observed in CD spectra 
due to variation of nucleotide sequence, GC composition, 
stacking of bases in different forms of DNA (i.e. A-, B- and 
Z-DNA), formation of macromolecular assemblies such as 
ribosomes and nucleosomes and ligand binding to DNA. 
As with proteins, induced CD is also possible with DNA 
such as when anthracene-9-carbonyl-N!-spermine binds to 
oligonucleotides of defined sequence. 

Many chromophores associated with biomacromolecules 
are themselves chiral and CD measurements of their spectra 
can give information on their interactions with proteins, 
DNA or macromolecular complexes. 


3.5.6 Linear Dichroism (LD) 


Electronic transitions underlying absorption of ultraviolet 
and visible light occur in individual molecules in a partic- 
ular direction or axis. Molecules dissolved in an aqueous 
solvent are randomly arranged so that the sample will give 
the same absorption spectrum regardless of the direction 
of a beam of incident light (Figure 3.39). If the molecules 
were all orientated in a single direction, however, different 
absorption spectra would be obtained depending on the di- 
rection of the light beam. In linear dichroism this fact is 
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Figure 3.39. Linear dichroism (LD) in orientated samples. The 
axis of a single spectroscopic transition in each sample molecule 
is denoted by a solid arrow. A; and A2 refer to absorbances cor- 
responding to this transition in directions shown by dashed arrow. 
(a) Randomly orientated sample. Axes of transitions are randomly 
arranged. A; = Ao. (b) Orientated sample. If all sample molecules 
are orientated in a single direction, Ay # A2. 


exploited by exposing orientated samples to plane polar- 
ized light. Extreme cases are presented where the incident 
beam is parallel to the axis of a particular transition and 
where the beam is perpendicular to the transition axis. Lin- 
ear dichroism (LD) is the differential absorbance of these 
incident beams as follows: 

LD = A, — A, (3.34) 
where the incident beam is perfectly parallel to the axis 
of the transition, LD > 0. If the polarization is perfectly 
perpendicular to this axis, then LD < 0. 

In LD spectroscopy, samples are orientated in a single 
axis relative to the incident linearly polarized light and it 
is possible to vary the relative orientation of sample and 
light such that the incident light is parallel or perpendicular 
to this axis. A wide variety of physico-chemical means of 
achieving this orientation have been described (Figure 3.40). 
In the stretched polymer technique, orientation is achieved 
by absorbing sample onto a polymer (e.g. polyvinylalcohol) 
either before or after stretching it in a particular direction. 
The long axis of the sample molecule aligns in the stretch 
direction. Other popular techniques include flow orientation, 
electric field orientation and squeezed gel orientation. 


3.5.7 LD of Biomolecules 


LD spectroscopy is well-suited to the study of certain sam- 
ple molecules such as fibrous proteins and DNA since these 
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Figure 3.40. Some orientation methods used in linear dichroism. (a) Stretched polymer orientation. Sample is absorbed to stretched polymer 
either before or after stretching in a particular direction. The long axis of sample molecules aligns in this direction. (b) Flow orientation. A 
cylindrical coquette flow cell consists of an internal and external cylinder. Sample is placed in the gap between these cylinders. The inner 
cylinder is rotated causing the sample to flow by viscous drag as shown by arrows. The long axis of sample molecules aligns with direction 
of flow. (c) Electric field orientation. An electric field is created around the sample by parallel electrodes, Excessive heating of sample can be 
avoided by use of pulsed fields as described in Chapter 5. (d) Squeezed gel orientation. Protein is embedded in a gel which is mechanically 
squeezed either bidirectionally or unidirectionally. This results in a rectangular or square squeezed gel. 


possess a high axial ratio (Figure 5.36) and are readily orien- 
tated in a particular direction. Information has been obtained 
from such studies about the relative orientation of DNA 
bases, intercalation of agents such as ethidium bromide into 
DNA, binding of drugs at the major and minor DNA grooves 
and flexibility of DNA (Example 3.4). ‘Naturally orientated’ 
biopolymers such as membrane proteins have also been ex- 
tensively studied by LD and this has yielded information 
about the orientation of chromophores in photosynthetic re- 
action centres during excitation. As with CD, the short time- 
scale of LD measurements makes this technique especially 
appropriate to such investigations. LD has also been used in 
studies of chromatin structure and of binding of proteins to 
DNA. 

A limitation of LD in the study of biomolecules is the 
necessity for orientated samples. Globular proteins are par- 
ticularly difficult to orientate and LD has not been as widely 
used in investigations of such samples. 


3.6 INFRARED SPECTROSCOPY 


Table 3.1 shows that energy increments (AEs) of lower 
value than those involved in electronic transitions corre- 
spond to frequencies in the infrared part of the spectrum 
(i.e. A = 10°-10° nm; see Section 3.2.2). Vibrational energy 
levels are quantized in the same way as electronic energy 
levels and, if infrared light of a frequency corresponding 
exactly to the AE between two vibrational energy levels is 
absorbed by a molecule; it may undergo a vibrational tran- 
sition. As shown earlier (Figure 3.8), the various vibrational 
energy levels are mainly due to differences in bond length 


and angle which are possible due to stretching and bending 
of bonds. 


3.6.1 Physical Basis of Infrared Spectroscopy 


When atoms come together to form a covalent bond, they 
undergo an electronic rearrangement which involves two 
competing sets of forces. The positively-charged nuclei of 
the two atoms tend to repel each other (electrostatic repul- 
sion) while the nucleus of each atom is attracted to the 
negatively-charged electrons of the other (electrostatic at- 
traction). The mean distance settled on between the atoms 
(i.e. the bond length) is a reflection of a point of balance be- 
tween these competing attractive and repulsive forces and, 
in general, will be characteristic for a particular chemical 
bond. For example, in polypeptides C*—C bonds are 1.52 A 
while N-C* bonds are 1.45 A. It is possible to plot the en- 
ergy of a molecule as a function of the interatom distance, 
r, in a Morse diagram (Figure 3.41). Most molecules in a 
population at rest are at the minimum of this diagram, but 
individual bond lengths can vary by +0.5 A and bond angles 
by +5° at room temperature. We can think of each chemical 
bond, therefore as existing in a particular vibrational energy 
level characterized by a particular bond length, bond angle 
and electron density. 

However, if radiation of an appropriate frequency is 
passed through the sample, it is possible for the molecule 
to undergo a transition to a higher vibrational energy level 
by absorption of radiation. As described in Section 3.2.2, 
vibrational energy levels are quantized in the same way 
as electronic energy levels although the AE values associ- 
ated with vibrational transitions are much smaller than those 
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Example 3.4 Linear Dichroism study of ligand binding to DNA 


4’,6’-Diamidino-2-phenylindole (DAPI) binds to the minor groove of AT-rich double-stranded DNA to form highly 
fluorescent complexes. This type of interaction with DNA is of major interest since it is known that chemical compounds 
can interact in a variety of ways with DNA such as the formation of adducts or intercalation between base-pairs. 
These interactions can result in important biological effects such as mutagenesis and/or carcinogenesis. DAPI is a 
particularly interesting model compound in this regard as it has found widespread use in fluorescent labelling of cells and 
chromosomes. In this example, LD was used to probe the interaction of DAPI with heteropolymers of the form (A-T),. 
LD spectra were obtained for a range of ratios of DAPI:DNA as follows. 
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These spectra show a strong negative value in the region around 260 nm which is consistent with the planes of the 
bases being perpendicular to the helix axis. The strong positive spectra in the region 300-400 nm are due to the DAPI 
itself. These data are consistent with DAPI binding to the DNA at an angle of 43—46° at low DAPI:DNA ratios. At higher 
ratios, there is some ‘stiffening’ of the DNA and much higher affinity binding of DAPI suggesting that two different 
modes of binding operate depending on the ligand concentration and that there is a change in structure in DNA as a result 
of ligand binding; that is allosterism. This illustrates how LD can provide detailed information on ligand binding to DNA. 


Source: (From Eriksson et al., 1993) Reprinted with permission from Eriksson, S. et al., Biochemistry (USA) 32, 2987-2998 ©) 1993 American Chemical 
Society. 


associated with electronic transitions. A vibrational tran- 
sition from the ground state to the first excited state due 
to absorption of infrared light is called a fundamental ab- 
sorption and the frequency, v, associated with this is called 
the fundamental frequency. Whilst other transitions are also 
possible (e.g. from the ground to the second or third excited 
state), they occur much less frequently and represent weak 
absorbance. 

An infrared spectrum consists of a plot of absorbance 
versus frequency or wavenumber (1/2). In comparison with 
absorbance spectra in the ultraviolet/visible range, infrared 


spectra of small molecules consist of narrow lines rather than 
broad peaks (Figure 3.42). However, because of the large 
number and variety of bonds in macromolecules, infrared 
spectra of proteins and DNA consist of a small number of 
broad peaks. 

Water gives a strong infrared spectrum because of its 
high absorption coefficient in the infrared range and high 
concentration (55 M). For this reason, infrared spectroscopy 
may be carried out either on nonaqueous samples prepared 
as dry films or on samples dissolved in alternative sol- 
vents such as D20 or chloroform. It is also possible to 
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Figure 3.41. Physical basis of infrared spectroscopy. When in- 
frared light of an appropriate frequency is absorbed, the molecule 
may be promoted through an energy increment AF to the first vi- 
brational energy level which will have a different value for either 
bond length or bond angle. This is called the fundamental absorp- 
tion. Different bond types have individual vibrational energy levels 
which has the consequence that the infrared absorption spectrum is 
characteristic for a specific chemical structure. 


subtract the spectrum due to the water solvent by difference 
spectroscopy (Figure 3.15) but this requires high concen- 
trations of protein or DNA (5—20%). These conditions are 
generally unsuitable for the study of biomacromolecules in 
their native state and this limits the practical application of 
simple infrared spectroscopy to their study. However, the 
technique has become standard for the study of low molec- 
ular mass biomolecules and yields structural information 
complementary to that available from other methods such 
as mass spectrometry (Chapter 4), CD (Section 3.5) and 
NMR (Section 3.7). Study of biomacromolecules by infrared 
spectroscopy requires more elaborate techniques which are 
described below (Sections 3.6.4 and 3.6.5, respectively). 


3.6.2 Equipment Used in Infrared Spectroscopy 


The overall design of an infrared spectrophotometer is very 
similar to that of the ultraviolet/visible spectrophotometer 
shown in Figure 3.9. The main differences are in the technol- 
ogy used for diffracting light and for detection of transmitted 
light. The monochromator used in an infrared spectropho- 
tometer is usually a diffraction grating. It will be recalled 
that either a prism or a diffraction grating may be used in 
the ultraviolet-visible spectrophotometer. The detector com- 
ponent of an infrared spectrophotometer is a thermocouple 
rather than a photocell normally used in a ultraviolet-visible 
spectrophotometer. This basic infrared spectrophotometer 
suitable for collecting spectra from small molecules is called 
a dispersive or grating infrared spectrophotometer. 
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Figure 3.42. Infrared spectrum of KCN. The spectrum of potas- 
sium cyanide in the frequency range 2000-4000cm~! is shown. 
One hundred percent transmittance is equivalent to an absorbance 
of 0. Note the sharpness of the lines corresponding to transitions. 
An infrared spectrum of a molecule such as a protein usually con- 
sists of much broader bands. Gans (1971) Vibrating Muscles: An 
Introduction to the Interpretation of Infrared and Raman Spectra, 
Chapman & Hall, with kind permission from Kluwer Academic 
Publishers. 


3.6.3 Uses of Infrared Spectroscopy 

in Structure Determination 

In principle, any chemical bond of a given type (e.g. C-H) 
might be expected to have a fundamental absorption identi- 
cal to that of any other bond of the same type. In practice, the 
chemical environment of the bond has an effect on the pre- 
cise frequency of absorption, since this may alter the bond’s 
electron density. Studies of infrared absorption from a large 
number of molecules of known chemical structure suggest 
that absorbance of infrared light near particular frequencies 
are characteristic for specific chemical groups (Table 3.5). 
These group frequencies allow us to determine aspects of the 
molecular structure of small molecules from their infrared 
absorbance pattern alone. It should be noted that many of the 
chemical groups detectable by infrared spectroscopy give 
little or no absorbance in the ultraviolet or visible parts of 
the spectrum. A further advantage of infrared spectroscopy 
is the variety of sample forms which can be analysed. It 
is possible to obtain infrared spectra of solutions, films or 
powders, and of crystalline samples. 

While dispersive infrared spectroscopy has limited appli- 
cability to the study of biopolymers generally, it has found 
some applications in structural studies on DNA. The forma- 
tion of hydrogen bonds and tautomerization between C=O 
(keto) and C—OH (enol) forms of nucleotide bases have 
detectable effects on infrared spectra. 


Table 3.5. Group frequencies in the infrared region 


Chemical group Frequency (cm7!) 


CH, 1460 
_CH,— 2930 
2860 
1470 
C-H 3300 
c-c- 1165 
-C=0 1730 
-C-H (in CH3) 2960 
2870 
-C-H (in CHO) 2870 
H 2720 
Cy 3060 
-CN 
-0-0— 1200—1100 
-OH 3600 
_NH, 3400 
=CH, 3030 
-SH 2580 
-C=N- 1600 
c-cl 725 
C=S 1100 


3.6.4 Fourier Transform Infrared Spectroscopy 


The dispersive infrared spectrometer described in Section 
3.6.2 allows scanning of comparatively short parts of the 
infrared spectrum in each measurement. The complete spec- 
trum needs to be scanned segment by segment which is a 
comparatively slow process. Moreover, such measurements 
have serious limitations for the study of DNA and proteins 
as mentioned above. An alternative approach is the incorpo- 
ration of a Michelson interferometer in a Fourier Transform 
infrared (FTIR) spectrometer which has now superseded 
dispersive instruments in biochemistry. The basic design of 
this is shown in Figure 3.43. An incident IR beam may be 
split by passage through a half-mirror or beam splitter made 
up of a material (e.g. crystalline KBr or a thin mylar film) 
which will transmit half of the light and reflect the other 
half. This beam splitter is positioned at 45° to the incident 
beam which is parallel to a stationary mirror. The beam 
transmitted by the beam splitter shines on a moving mirror 
arranged at 90° to the incident beam. 

The two beams are reflected by the fixed and moving mir- 
rors, respectively and recombine to form a resultant beam. 
If the two beams are in phase, the resultant beam is the sum 
of the two beams (in accordance with the principle of su- 
perposition mentioned in Section 3.1.1). This is a process 
called constructive interference (Figure 3.44). If the two 
beams are out of phase then the resultant beam is weaker 


SPECTROSCOPIC TECHNIQUES 87 


than the sum of the two beams which is called destructive in- 
terference. Whether constructive or destructive interference 
occurs depends on the relative positions of the two mirrors 
at any moment in time. In practice, a number of scans are 
performed in the absence of sample and then repeated with 
sample present. These scans are very rapid and cover a wider 
part of the infrared spectrum than is possible in a dispersive 
spectrometer. The interference patterns obtained are added 
together and analysed by a mathematical tool called the 
Fourier transform. It should be noted that this is also used 
to analyse the diffraction pattern of X-rays in protein crys- 
tallography and in the analysis of multi-dimensional NMR 
spectra (Chapter 6; Appendix 2). 

FTIR spectroscopy is essentially an interference-based 
technique rather than one based on absorption. Each 
molecule gives a characteristic FTIR spectrum which re- 
flects its chemical structure. Moreover, proteins and DNA 
are amenable to FTIR analysis since the effects of solvent 
water can be largely excluded. The technique is applicable 
to most classes of biomolecules including lipids, glycolipids 
and oligosaccharides. 

FTIR has proved particularly useful in the study of the 
structure and dynamic properties of proteins and peptides 
(Example 3.5). Hydrogen bonding between peptide bonds 
underlies secondary structure in proteins. Studies of the in- 
frared spectra of synthetic homo-polypeptides led to the em- 
pirical identification of a number of peaks or amide bands 
associated with such hydrogen bonds. For example, an ab- 
sorption in the region 1597-1672 cm! (the amide I band) is 
predominantly associated with stretching vibrations of the 
C=O bond. Other amide bands arise from a combination of 
effects. For example, the amide II band (1480-1575 cm7!) 
is due partly to N-H bending (50%) and partly to C-N 
stretching (40%). Since hydrogen bonding strongly affects 
the electron density of atoms involved in the peptide bond, 
it is to be expected that amide bands could be used as a 
measure of secondary structure in synthetic polypeptides. 
In proteins, FTIR measurements have also proved useful for 
the identification of helices, turns and B-sheet secondary 
structure in both fibrous and globular proteins although in- 
terpretation of such measurements is more complicated than 
for peptides. These complications are introduced by overlap 
between different absorption maxima, slight deviations from 
ideal secondary structure in many proteins and limitations in 
data analysis. Notwithstanding these limitations, FTIR pro- 
vides a sensitive means of probing secondary structure in 
proteins. In particular, structural changes due to alterations 
in pH, ion strength, pressure or temperature, ligand binding, 
adsorption to solid surfaces, aggregation and folding may 
readily be detected and analysed by FTIR. The technique is 
complementary to X-ray crystallography, CD/LD and fluo- 
rescence measurements and is suitable for a wide range of 
samples including membrane proteins. 
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Figure 3.43. The Michelson interferometer as used in FTIR spectroscopy. The beam splitter divides an incident beam into a transmitted 
and reflected beam, respectively. These are reflected by a stationary mirror and a moving mirror, respectively. The reflected beams are 
recombined at the beam splitter and the resultant beam passes through the sample. The reflected and transmitted beams can interfere with 
each other (Figure 3.44) depending on the relative positions of the stationary and moving mirrors. The resultant beams detected are analysed 
by the Fourier Transform (Appendix 2) to generate an FTIR spectrum which is characteristic of the sample. 


(a) Beam 1 


Resultant beam 


(b) Beam 1 


Beam 2 


Resultant beam 


Figure 3.44. Interference between waves. (a) Constructive interference. If superimposed wave beams (of equal amplitude A) are in phase, 
they reinforce each other such that the resultant beam has amplitude equal to the sum of that of the superimposed waves (i.e. 2A). (b) 
Destructive interference. If superimposed beams are 180° out of phase, they weaken each other such that the resultant beam has amplitude 
of 0. Where the beams are less than 180° out of phase, the resultant beam would have amplitude between 0 and 2A. 
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Example 3.5 Characterization of photoconversion of the green fluorescent protein of the jelly fish Aequorea 
victoria by FTIR 


The jelly fish, Aequorea victoria, possesses a green fluorescent protein (GFP) which functions as the secondary emitter 
of chemiluminescence from this organism (Section 3.4.1). The chromophore involved in this process is formed by 
cyclization of the tripeptide Ser-Tyr-Gly which makes up residues 65-67 followed by dehydrogenation of Tyr-66. This 
chromophore can exist in a neutral form (Amax = 395 nm; GFP395) or an anionic form (Amax = 480 nm; GFP4g9). Exposure 
to ultraviolet light converts most of the chromophore from the neutral to the anionic form. Once formed, the anionic form 
requires several hours to reconvert back to the neutral form. The following drawing shows the absorption spectra of the 
two forms of the protein. 
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Structures determined by X-ray diffraction (Chapter 6) of wild type and mutant forms of this protein corresponding to 
GFP395 and GFP4go suggested that Glu-222 might function as a proton acceptor. 

FTIR spectroscopy is highly sensitive to alterations in protonation state between protein conformations. In particular, 
difference FTIR spectroscopy where the FTIR spectrum of one protein conformation is subtracted from the other is 
informative about events such as protonation/deprotonation. In the present example, an FTIR spectrum was recorded for 
GFP3o5 and, after photoconversion, this was subtracted from the spectrum for GFP4g9. The difference FTIR spectrum for 
GFP is compared in the following illustration with those of photoactive yellow protein (PYP, another photoactive protein) 
and p-coumaric acid (pCA), a model chromophore. This allowed allocation of peaks at 1580, 1497 and 1147 cm™! to the 
phenolic ring common to all three chromophores. 

If the -COO™ group of a Glu residue were protonated during photoconversion, this would be expected to give rise 
to a new peak near 1730 cm~! which would appear in the difference spectrum as a minimum. In PYP a Glu residue is 
known to be protonated during photoconversion and a minimum due to this is evident in the difference spectrum for this 
protein (arrow). The fact that there is no corresponding minimum in the spectrum for GFP allowed the conclusion that 
less than 0.1 Glu residues of the GFP protein changes protonation state on photoconversion. Thus, using FTIR difference 
spectroscopy, it was possible to exclude the suggestion that Glu-222 acts as a proton acceptor during this process. 


Source: (From van Thor et al., 1998) Reprinted with permission from Thor et al., Biochemistry 37, 16915-16921 (c) 1998 American Chemical Society. 
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Hydrogen bonding also underlies secondary structure in 
DNA and RNA and in DNA-RNA complexes. FTIR spec- 
troscopy has found widespread use in the study of many 
biological processes involving nucleic acids. These stud- 
ies range from in vitro experiments with synthetic oligonu- 
cleotides or their chemical analogues to investigations of tu- 
mour samples from cancer patients. The effect of nucleotide 
composition and sequence on base-pairing, the process of 
DNA-RNA hybridization and potential of DNA/RNA ratio 
in ‘grading’ the severity of cancer in human patients have 
all recently been investigated. 


3.6.5 Raman Infrared Spectroscopy 


In addition to absorbance of light, sample molecules can 
also scatter light and this can be detected at 90° to the di- 
rection of the incident beam. Scattering of monochromatic 
light is of two general types (Figure 3.45). Rayleigh scatter- 
ing occurs when the frequency of the scattered light is the 
same as that of the incident light. Most of the light scattered 
by asample results from this process. Much less commonly, 
the frequency of the scattered light may be greater or less 
than that of the incident light. This is a phenomenon known 
as Raman scattering and it forms the basis of Raman spec- 
troscopy. Both of these phenomena occur with light of all 
frequencies and if the incident beam is composed of visible 
light then the scattered light will also be in the visible range. 
Because Raman scattering occurs with such a low probabil- 
ity, it is usual to use a high-energy laser emitting light in the 
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Figure 3.45. Light scattering. Incident light passes through sample 
solution (solid arrow). A proportion of this light is also scattered 
in all directions (dashed lines). This scattered light is mostly of 
the same frequency as the incident beam (Rayleigh scattering). 
However, occasionally the scattered light has a different frequency 
to the incident beam (Raman scattering). Measurement of these 
latter scattered beams is the basis of Raman spectroscopy. 
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Figure 3.46. Raman spectrometer. Monochromatic light is passed 
through a concentrated sample. Scattered light (dashed lines) is 
collected with the aid of a curved mirror. This is then diffracted 
through a monochromator which allows the identification of fre- 
quencies due to both Rayleigh and Raman scattering. Note that the 
lower mirror reflects transmitted light back through the sample to 
double the intensity of scattered light. 


Mirror 


high-frequency part of the visible spectrum as the incident 
light (Section 3.9). 

The relevance of Raman spectroscopy to infrared spec- 
troscopy lies in the fact that the energy-differences be- 
tween incident and Raman scattered light corresponds to 
vibrational transitions. Raman spectroscopy is therefore 
a means of studying vibrational transitions, even though 
the light used is usually in the visible part of the spec- 
trum. Moreover, since the phenomenon is based on scat- 
tering rather than absorption, it is possible to measure 
Raman spectra in aqueous solutions without the problems 
encountered due to water absorption in dispersive infrared 
spectroscopy. 

The experimental apparatus used to measure Raman spec- 
tra is shown in Figure 3.46. Since the effects measured occur 
with such low probability, it is necessary to use highly- 
concentrated samples (i.e. in the range 20 mg/ml) to detect 
scattered light. Moreover, it is essential that the sample does 
not aggregate under these conditions as this may change the 
properties of the spectrum obtained. 

In Raman scattering, a small amount of energy is trans- 
ferred from the incident light (frequency, v;) to promote 
the sample molecule from the ground state to an excited 
vibrational level. In other words light energy causes a vi- 
brational transition. Since E = hu (Equation (3.2)), this loss 
of energy causes a small decrease in the frequency of the 
scattered light to a value v. This decrease can be expressed 
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Figure 3.47. Raman spectrum of CCl4. Light scattered by Rayleigh 
scattering has a Av of 0. Lines corresponding to v — vj are called 
Stokes lines and these have a characteristic values for each chemical 
structure. Corresponding, though much weaker, lines are observed 
at v + vi which are called anti-Stokes lines. 


as Av: 


Av=v- v (3.35) 


where v is the frequency of Raman scattered light. Passage 
through a monochromator allows diffraction of scattered 
light into its various frequencies and a Raman spectrum is 
frequently a plot of light intensity versus Av (Figure 3.47). 
Most of the scattered light has the same frequency as the 
incident light, v;, that is Av = 0. This is called the Rayleigh 
line of the spectrum. However, other lines in the spectrum 
have measurable Av values which depend on the structure 
of the sample molecule. These are called Stokes lines. 

A second way in which the sample molecule and inci- 
dent light can interact is where some molecules existing in 
the first excited vibrational energy level impart energy to 
the incident light thus decreasing the frequency of Raman 
scattered light. This gives rise to lines in the spectrum of 
opposite sign but identical Av to the Stokes lines which are 
called anti-Stokes lines. Because most molecules in a popu- 
lation are in the ground state, vibrational transitions leading 
to a decrease in frequency are statistically much more likely 
than those leading to an increase in frequency. Therefore, 
anti-Stokes lines are usually much weaker than Stokes lines. 
It should be clear from this that a sample molecule will give 
a Raman spectrum characteristic for its chemical structure. 
Since this spectrum is due to vibrational transitions, the in- 
formation contained in such a spectrum is no different to 
that contained in a dispersive infrared spectrum. However, 
Raman spectroscopy has the important practical advantage 
that it is not affected by solvent water. 
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In addition to studies on samples prepared in solution, it is 
also possible to measure Raman spectra on samples prepared 
as dry fibres or as single crystals. Thus it is possible to com- 
pare Raman spectra collected in the presence and absence of 
solvent water. Structure elucidation of biomacromolecules 
involves collection and analysis of X-ray diffraction data 
from samples also prepared as fibres or crystals (Chapter 6). 
Raman spectroscopy therefore provides a convenient means 
of confirming structural data obtained from X-ray crystal- 
lography. 


3.7 NUCLEAR MAGNETIC 
RESONANCE (NMR) SPECTROSCOPY 


3.7.1 Physical Basis of NMR Spectroscopy 


Atoms and molecules have a variety of quantized energy 
levels (e.g. electronic, vibrational and rotational). Many 
spectroscopic techniques take advantage of transitions be- 
tween these energy levels with different AE values being 
related to particular frequency-ranges of the electromagnetic 
spectrum by Equation (3.2). 

A further property of atoms which is quantized is mag- 
netic spin and nuclear magnetic resonance (NMR) spec- 
troscopy is a spectroscopic technique which takes advantage 
of this fact. To explain the physical basis of NMR we can 
use the hydrogen nucleus, 'H, as a good example (this type 
of NMR is called proton NMR). However, as will become 
clear (Section 3.7.2), magnetic spin is a property of many 
different types of atoms. The 'H nucleus may be regarded 
as a spinning positive charge (Figure 3.48). This generates a 
magnetic field which will have a magnetic spin moment, y. 
If an external magnetic field (field-strength, Bo) is applied to 
such a nucleus, it can orientate itself either with (parallel) 
or against (antiparallel) this field in much the same way 


Nucleus 


Figure 3.48. Physical basis of NMR. A spinning nucleus generates 
a magnetic field with a spin moment, u, which generates an angular 
momentum, j. When placed in an external magnetic field (Bo), the 
nucleus can align (a) with or (b) against the field. The difference 
in energy between these orientations is AE which corresponds to 
frequencies in the radiowave part of the spectrum. 
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as a bar-magnet does in the Earth’s magnetic field on the 
macroscopic scale. These two orientations are referred to 
as spin states and are distinguishable by their different spin 
quantum numbers, my, which are, respectively, —1/2 and 
+1/2. The magnetic spin moment ‘wobbles’ or precesses 
around the axis of the external magnetic field by an angle, 
0, and rotates around this axis with a particular frequency, 
a, which is called the Larmor frequency. 

The potential energies of the two spin states are given by 
Equations (3.36) and (3.37): 


1. (Low energy spin state; m; = — 1/2) 

E=-—p- Bo-sind (3.36) 
2. (High energy spin state; m; = +1/2) 

E = u - Bo - sin0 (3.37) 


The energy difference, AE, between them is therefore 
given by; 


AE = 2u - Bo- sind (3.38) 


In fact, at room temperature the energy difference be- 
tween the two spin states is very small and the low energy 
orientation is only marginally favoured over the high energy 
one (Figure 3.49). However, Equation (3.38) shows that AE 
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Figure 3.49. Effect of applied magnetic field on spin states. In the 
absence of an applied magnetic field (i.e. zero field), the energy 
difference (AE) between the two spin states is very small. The 
stronger the field strength (Bo) of an applied magnetic field becomes 
the larger AE (Equation (3.38)). 


varies in a manner which is directly proportional to the ap- 
plied magnetic field. Early work on biomolecules used mag- 
netic field strengths of only approximately 40 MHz. Modern 
NMR spectrometers (Section 3.7.5) use much larger field 
strengths (in the range 500-800 MHz) which give rise to 
larger AE values and yield NMR spectra of much higher 
resolution. 

Combining Equations (3.38) and (3.2) shows that the elec- 
tromagnetic frequency, v, corresponding to a transition be- 
tween these energy levels is: 


Ta 2u - Bo- sind (3.39) 
h 
where h is Planck’s constant. 

NMR spectroscopy depends on absorption of electromag- 
netic radiation from the radiowave part of the spectrum (Fig- 
ure 3.7) causing the nucleus to undergo a transition from a 
low to a high energy spin state. The precise value of v re- 
quired for this transition depends on both the identity of the 
nucleus (Section 3.7.2) and on its precise chemical envi- 
ronment (Section 3.7.3). Because of this, NMR spectra can 
yield precise information on the structure/composition of 
biomolecules and on processes in which they are involved 
(e.g. chemical reactions). 

In order to undergo an NMR transition at a particular value 
of v, a specific set of circumstances called the resonance 
condition needs to exist. As previously shown (Figure 3.1), 
electromagnetic radiation of frequency v has an associated 
oscillating magnetic field. This field (of field strength B,) 
may be regarded as rotating around the spinning nucleus ina 
plane at right angles to the applied external field, By (Figure 
3.50). However, it is much weaker than the applied external 
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Figure 3.50. The resonance condition. Exposure of the nucleus to 
radiofrequency radiation sets up a magnetic field (of field strength, 
Bı, shown in grey) which has a frequency of oscillation. The res- 
onance condition occurs when this frequency equals the Larmor 


frequency of the spin magnetic moment. Transition between spin 
states only occurs at the resonance condition. 


field (i.e. B} << Bo). The resonance condition is achieved 
when the frequency of rotation of the field represented by B, 
equals the Larmor frequency. The relationship between the 
Larmor frequency and the radiowave frequency is given by: 


@=2n-v (3.40) 

This requirement for the resonance condition explains 
the inclusion of the word ‘resonance’ in ‘nuclear magnetic 
resonance’ . Originally, this was achieved experimentally by 
holding the Larmor frequency at a fixed value (i.e. by ex- 
posing the sample to a continuous wave of electromagnetic 
radiation of frequency v) and varying the strength of the 
applied magnetic field (Bo). However, modern NMR spec- 
trometers monitor many radiowave frequencies simultane- 
ously by exposing the sample to a pulse composed of several 
radiowave frequencies. 


3.7.2 Effect of Atomic Identity on NUR 


So far we have considered only the nucleus of 'H. Because 
of the abundance of 'H in biological systems, proton NMR is 
especially informative about the structure and functioning 
of biomolecules and is probably the most popular single 
form of biological NMR. However, in contrast to most other 
spectroscopic techniques, a major strength of NMR is that 
other atoms may be studied in the same way as protons 
allowing us to obtain ‘atom-specific’ information. Nuclei 
which are amenable to study by NMR have a characteristic 
magnetic spin, / (Table 3.6). This is a consequence of pairing 
of individual spins due to neutrons and protons which each 
have m; values of 1/2. Nuclei possessing even atomic mass 
and even atomic number have no net spin and therefore 
I =0 (e.g. ?C, !°O). Several nuclei occurring in Biology 
have J = 1/2 (e.g. 'H, °C, 3'P), while others have J = 1 


Table 3.6. Magnetic properties of some nuclei important in 
biochemistry 


Natural 

abundance y rads”! NMRvatT= 
Nucleus / (%) Ta 2.3488 (MHz) 
1H 1/2 99.98 26.752 100 
H 1/2 0.015 4.107 15.35 
Ke 0 98.9 — — 
3C 1/2 1.10 6.7283 25.144 
4N 1 99.63 1.9338 7.224 
ze) 0 99.76 — — 
S 0 95.02 — — 
31p 1/2 100 10.8394 40.481 
SC] 3/2 75.77 2.642 9.798 
DN 1/2 0.37 —2.7126 10.133 
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(e.g. ?H and '4N) or >1 (e.g. SCl; I = 3/2). This means 
that, in addition to being atom-specific; NMR information 
is also isotope-specific, allowing us to obtain distinct NMR 
spectra for 'H and 7H, for example. 

Each nucleus possessing spin (i.e. J = 0), undergoes a 
characteristic transition from a lower to a higher spin state 
at the resonance condition: 


(3.41) 


where y is a constant property of the nucleus called the 
magnetogyric ratio (sometimes also called the gyromagnetic 
ratio). In practice, this means that we can study individual 
nuclei by varying the field strength thus generating proton 
NMR spectra, 2C NMR spectra, and so on. 


3.7.3 The Chemical Shift 


If our discussion of NMR was to finish at this point, we 
would have a picture of a specific resonance condition for 
each nucleus possessing magnetic spin determined by the 
value of the magnetogyric ratio, y, for that nucleus. What 
makes NMR especially informative, however, is the fact that 
the precise radiation frequency, v, corresponding to the res- 
onance condition for each type of nucleus at a given applied 
magnetic field strength can be affected by its immediate 
chemical environment. For example, a proton forming part 
of a -CH;, group will achieve the resonance condition at a 
different radiation frequency to a proton forming part of a 
-CH; group or an —NH) group. This is due to the effect of 
the magnetic fields of nearby nuclei on that of the nucleus 
undergoing the transition and is known as chemical shield- 
ing. The effect is to alter the magnetic field experienced by 
the nucleus (Ber) such that: 

Be = Bo(1 — 0) (3.42) 
where ø is a shielding constant. 

This phenomenon operates over short distances (approx. 
3-8 A) and allows us to identify characteristic frequencies 
corresponding to particular chemical groups. For example, 
a —CH) group in one chemical structure (e.g. the amino acid 
serine) will achieve the resonance condition at a similar 
v to the -CH» group of a monosaccharide. Intramolecu- 
lar shielding effects like these can be used to determine 
the chemical structure of small molecules from their NMR 
spectra alone (see the example of ethanol below). “Though- 
space’ shielding effects are also possible, however, between 
adjacent chemical groups. For example, if a metal (Table 3.7) 
happened to be located near the nucleus under investigation 
or if two amino acid side-chains were brought into proxim- 
ity in a folded protein, this would cause chemical shielding. 
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Table 3.7. Proton NMR chemical shift values for some common 
chemical groups encountered in biomolecules (nucleus under in- 
vestigation is denoted by arrow) 


Chemical group Chemical shift (ppm) 
TMS 0 
CH3— Metal 205-0 
CH3— CH,;— 0.8-1 
CH3— CH,— CH,— 1.2-2 
1 0 
CGH- C 1.8-2.2 
= N 
| 22-3 
H-C=C— 
| 3.2-3.8 
CH;—O— 
| 3.5-4.5 
CH;—O— 
4.7-5.5 
TE. 
a 
—CH=CH— 4:5-7 
Benzene T 


We shall see later on in this text (Chapter 6) that through- 
space shielding effects such as these make it possible for us 
to determine three-dimensional structures of proteins and 
other biomacromolecules from NMR measurements. 

Largely for instrumental reasons, it is difficult to mea- 
sure v values accurately. To standardize measurements 
between different NMR spectrometers and different exper- 
imental conditions, it is usual to include a reference com- 
pound (normally tetramethylsilane, TMS) with the sample 
to be analysed. The frequency corresponding to the res- 
onance condition for each transition in the sample is then 
expressed as the chemical shift, ô, in parts per million (ppm) 
as follows: 

Vs — VRef 


ô = ———_ x 10° 


VRef 


(3.43) 


where vg and vpet are the frequencies of radiowave radiation 
corresponding to the resonance condition of the sample and 
reference nucleus, respectively. TMS has a chemical shift 
of 0 and each chemical group has a particular value-range 
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Figure 3.51. Proton NMR spectrum of ethanol. An NMR spectrum 
consists of a plot of absorbance intensity versus chemical shift 
(d). Peaks arising from each of the three types of proton in the 
structure are labelled. In the case of -CH3 and CH3 groups, multiple 
peaks arise as a result of spin coupling. Butler and Harrod (1989), 
Inorganic Chemistry: Principles and Applications, Addison Wesley 
Longman, Reproduced with permission. 


as illustrated in Table 3.7. Generally speaking, electron- 
withdrawing groups bonded to the group under investigation 
tend to increase the chemical shift associated with that group 
while electropositive centres (e.g. metals) tend to decrease 
it. Note that the chemical shift may have negative or positive 
values. 

An NMR spectrum consists of a plot of the intensity of 
absorbance of radiowave radiation as a function of chem- 
ical shift (ppm). A proton NMR spectrum for ethanol is 
shown in Figure 3.51 and it is possible to identify chemical 
shifts for each type of proton (i.e. -CH3, -CH2, -OH) in 
this spectrum. Since there is only one type of bonding struc- 
ture which would explain these chemical shifts, this NMR 
spectrum would allow is to write down the structure of this 
molecule if it were not previously known. More definitive 
identification of structure in such spectra is made possible 
by a phenomenon known as peak splitting. In Figure 3.51, 
this is clear in the parts of the spectrum due to the -CH; and 
—CHb) groups where three and four peaks are visible, respec- 
tively, rather than the single peak we might have expected. 


3.7.4 Spin Coupling in NMR 


Peak splitting in NMR spectra arises from a phenomenon 
known as spin coupling or J coupling. We can understand 
this if we look at the splitting of the -CH, ‘peak’ into four 
peaks. This is due to coupling of the spins of the adjacent 
—CH, group. The three protons of the -CH; group can havea 
spin of either +1/2 or —1/2 as previously described (Figure 
3.49). Accordingly, we can write the possible spin arrange- 
ments of these protons as up (t) or down (4) as shown in 
Table 3.8. Two of these arrangements occur singly while 


Table 3.8. Spin coupling between -CH and -CH3 groups in 
ethanol 


Arrangement Intensity 
CH, resonances coupled with -CH3 
Allup ttt 1 
Two up tt) tt U4 3 
One up JN td Wt 
All down | JJ 
CH; resonances coupled with -CH2 
All up tt 1 
One up Jt t4 2 
All down | J 1 


Note: Resonances associated with the -CH3 group are affected by 
the spins of protons on the nearby —CH3 group and vice versa. 
This is called spin-spin coupling. The number and type of possible 
spin arrangements on the coupled group is shown with arrows. 
The NMR peak due to the -CH2 group splits in the ratio 1:3:3:1 
while that due to the -CH3 group splits 1: 2:1. This ratio depends 
on the number of possible spin arrangements in the coupled nuclei. 


two occur in equivalent arrangements three times. The net 
spin experienced in the resonance condition for the -CH3 
group has therefore four possibilities which occur in a ratio 
1:3:3:1. This explains why the -CH, group gives rise to a 
characteristic pattern of four peaks in a ratio 1:3:3:1. 

Generalizing from this example, the number of peaks to 
be expected from a nucleus due to an adjacent equivalent 
nucleus (or set of equivalent nuclei with the same J value) 
is given by the following equation: 


Number of peaks = (2 x n x J) +1 (3.44) 


where n is the number of equivalent nuclei in the chemical 
group. For the -CH3 group resonance condition the value of 
peaks expected as a result of coupling to the nearby -CH3 
protons is (2 x 3 x 1/2) + 1 = 4 peaks. For the —CH; 
group we would predict (2 x 2 x 1/2) + 1 = 3 peaks due 
to coupling with the protons of -CH). The actual spacing 
(in Hertz) between each peak in an NMR spectrum is called 
the coupling constant or J constant. The closer together the 
coupled nuclei are in the sample molecule, the stronger the 
coupling and hence the greater the J constant. The number 
of bonds across which the coupling occurs is conventionally 
denoted as *J 14 — 14 (in our example of ethanol) to denote 
the fact that the coupled nuclei ('H) are separated by three 
bonds. 

The relative intensity of each peak in such multiplets (i.e. 
the area of each peak) may be found as the coefficients 
of a binomial expansion of the form (1 + x)" where n is 
the number of nuclei coupled to the resonance condition 
being measured. For the -CH, and —CH; groups, these are, 
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respectively: 
CH); (1+x)?=14+3x+3x?+1x* (3.45) 
CH3; (1 +x)? =1+2x + 1x? (3.46) 


where the coefficients are 1:3:3:1 (-CH2) and 1:2:1 
(—CH3) as we have just seen (Table 3.8 and Figure 3.51). 
The coefficients arising from peak splitting due to spin cou- 
pling with n nuclei of spin 1/2 can be calculated from the 
above binomial or, more conveniently, may be obtained from 
Pascal’s triangle (Table 3.9). 

What about a situation where two nuclei (A and B) with 
I #0 form part of a single chemical group? We can calcu- 
late the number of peaks to be expected from such a group 
as follows: 


Number of peaks = (2-n-I +1), x (2:m-I+1)g 
(3.47) 


where n and m refer to the number of each nucleus and Z to 
their magnetic spin. For example, the chemical shift due to 
an NH) group (ny = 1; my = 2; In = 1; Jy = 1/2) would 
be expected to result in (2 x 1 x 1 +1) x (2 x 2 x 1/2 +1) 
= 9 peaks. 

It should be obvious from this that a unique pattern of 
peak splitting of chemical shifts would be expected for most 
small molecules such as amino acids, monosaccharides and 
small peptides which allows determination of their chemical 
structure by NMR spectroscopy alone. 


3.7.5 Measurement of NMR Spectra 


NMR spectra are determined experimentally in a specialized 
type of spectrophotometer called an NMR spectrometer (see 
Figure 3.52 for an outline design). The main unique feature 
of this instrument compared to spectrophotometric devices 
already described in this chapter is the powerful magnet 
which is responsible for inducing a magnetic field, Bo in 
the x-axis. Radiowave frequency radiation is generated by 
a transmitter and may either be of a single frequency or in 


Table 3.9. Pascal’s triangle 


n Relative intensities 

0 1 

1 1:1 

2 1:2:1 

3 1:3:3:1 

4 1:4:6:4:1 

5 1:5:10:10:5:1 

6 1:6:15:20:15:6:1 
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Figure 3.52. Outline design of an NMR spectrometer. The sample is placed in a tube called the probe. This is surrounded by wire coil. 
A powerful magnetic field (of field strength Bo) is generated in the x-axis. Radiofrequency radiation is generated by the transmitter which 
generates a magnetic field orientated along the y-axis. When the frequency of this field is the same as the Larmor frequency, the resonance 
condition is met. Absorption of radiation induces a current in the wire coil along the z-axis which is electronically detected. This current is 


proportional to the intensity of absorbance. 


the form of a pulse of frequencies (in the case of Fourier 
Transform NMR spectrometers). A magnetic field (B1) of 
a frequency near the Larmor frequency is generated either 
by this radiowave radiation alone or with the aid of a small 
accessory magnet. This field is orientated at right angles to 
the field generated by Bo, that is along the y-axis. The sample 
is placed in a tube called the probe which is surrounded by 
a coil of wire. When the resonance condition is achieved in 
the sample, a current is induced in the coil of wire which 
is at right angles to both magnetic fields, that is along the 
z-axis. This current is analysed by computer and converted 
into an intensity of absorbance value. More sophisticated 
NMR spectrometers are used to generate multidimensional 
NMR such as two-dimensional NMR. This is discussed in 
more detail in Chapter 6. 

NMR is a particularly versatile analytical technique suit- 
able for both low and high molecular weight molecules. 
Structural and dynamic information can be obtained there- 
fore on samples ranging in size from amino acids and 
peptides to proteins of masses up to 30kDa (using multi- 
dimensional NMR; Chapter 6). While strongest spectra are 
obtained with liquid samples, it is also possible to analyse 
solid samples such as crystals. Indeed, a technique closely 
related to NMR called magnetic resonance imaging (MRI) 
facilitates analysis of the whole human body (or compo- 
nent parts such as head, legs and torso) as an aid to clinical 
diagnosis. 

Chemical shifts obtained from nuclei forming compo- 
nent parts of biomolecules are affected by their chemical 
environment. NMR is therefore especially sensitive to the 


involvement of nuclei in hydrogen bonding or their prox- 
imity to aromatic and carbonyl groups. The technique has 
been widely used for this reason in studies of protein fold- 
ing (Chapter 6), ligand binding, molecular mobility within 
proteins, position and role of metals in metalloproteins and 
in studies of pH effects on proteins. Another set of applica- 
tions, *'P NMR focuses on measurements of phosphorylated 
biomolecules. This makes possible the study of effects on 
biological membranes (which are rich in phospholipids), 
signal transduction systems and processes such as glycol- 
ysis and oxidative phosphorylation. Example 3.6 describes 
a study of this kind on creatine phosphokinase activity in 
intact mitochondria. 


3.8 ELECTRON SPIN RESONANCE 
(ESR) SPECTROSCOPY 


Electron spin resonance (ESR) spectroscopy is based on an 
identical physical principle to NMR. For this reason, several 
aspects of ESR spectroscopy are similar to NMR but the 
following description also highlights some important points 
of difference. 


3.8.1 Physical Basis of ESR Spectroscopy 


ESR arises from the fact that an unpaired electron also pos- 
sesses a spin magnetic moment when placed in a powerful 
magnetic field. Samples containing unpaired electrons are 
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Example 3.6 Determination of creatine phosphokinase activity in intact mitochondria by 31P NMR 


In heart mitochondria it is known that ADP produced by creatine phosphokinase in the following reaction has preferential 
access to oxidative phosphorylation. 


creatine 
phosphokinase 
Creatine + ATP — phosphocreatine + ADP 


A convenient method for monitoring phosphorylated molecules such as adenine nucleotides and phosphocreatine is 
offered by *!P NMR. Because this is an ‘atom-specific’ technique of high sensitivity, it is possible to monitor changes 
in levels of phosphorylated molecules in crude extracts, intact organelles (e.g. mitochondria, intact cells and even intact 
organs such as perfused heart. 

A good illustration of the power of this technique is offered by a study of the rate of creatine phosphokinase activity in 
intact, functional mitochondria from skeletal muscle. The effect of external ADP and ATP concentrations on this activity 
was measured by incubating mitochondria in (a) 0.5 mM ATP or (b) 0.4mM ADP. NMR peaks could be assigned to; (1) 
inorganic phosphate, (2) phosphocreatine, (3-5) y, œ and PB - phosphate of ATP, (6) and (7) B and a-phosphate of ADP, 
respectively in a time-course from 3.84 to 30.71 min. as shown in the following spectra. 

These data allowed quantification of the rate of production of phosphocreatine (peak 2 in (a) and (b) at a range of 
ATP and ADP concentrations. This data which is summarized in plot c) below demonstrated that the Km of creatine 
phosphokinase for each nucleotide differed (KAP? = 63 uM and KATP = 28 uM) while Vmax for each nucleotide remained 
the same. 


(a) (b) 


3.84 min 


0.075 


0.050 


PCr synthesis rate (mM/min) 


0 200 400 
o ATPor e ADP (uM) 


This example demonstrates how NMR allows detailed information to be gathered from a complex biochemical system 
by taking advantage of the sensitive and atom-specific nature of the technique. 


Source: (From Kernec et al., 1996) Kernec, F. et al., (1996) Biochemical and Biophysical Research Communication s 225, 819-825. Reproduced with 
permission of Academic Press, Inc. 
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said to be paramagnetic and ESR is also sometimes re- 
ferred to as electron paramagnetic resonance (EPR). Like 
many nuclei, this spin magnetic moment is quantized with 
allowed spins of only +1/2. However, the spin magnetic mo- 
ment associated with an unpaired electron is approximately 
1000-fold greater than that associated with 'H. The energy 
difference (A E) between the two spin states of the electron 
therefore corresponds to the microwave part of the electro- 
magnetic spectrum rather than the radiowave associated with 
NMR (Figure 3.7). This has the important consequence that 
we can measure ESR spectra independently of NMR spectra 
and vice versa since each technique avails of a distinct part 
of the electromagnetic spectrum. 

The AE value relates to the magnetic field strength, Bo, 
as follows: 

AE=h-v=g-B- Bo (3.48) 
where p is a constant called the Bohr magneton and g is the 
Lande splitting factor. For a free electron in vacuo, this split- 
ting factor has a value of 2.0023. However, it can have dif- 
ferent values for various paramagnetic samples. Moreover, 
in solid samples (e.g. crystals); g can have different values 
depending on the orientation of the paramagnetic species. 


3.8.2 Measurement of ESR Spectra 


Because of their common physical basis, ESR and NMR 
share several points of similarity. Both techniques depend 
on a frequency of electromagnetic radiation corresponding 
to a specific resonance condition. In both techniques, this 
frequency is determined partly by the identity of the chemi- 
cal group containing the magnetic spin moment. In the case 
of ESR, individual chemical groups cause shielding of the 
unpaired electron, resulting in an altered value for g. Thus, 
experimentally-determined values of g may be used to iden- 
tify the paramagnetic chemical group. Peak splitting can 
also arise in ESR as a result of spin coupling with nearby 
nuclei, especially the nucleus actually carrying the unpaired 
electron. The number of peaks we would expect is given by: 


Number of peaks = 27 + 1 (3.49) 


where J is the spin quantum number of the nucleus 
(Table 3.6). The relative intensities of each peak also follow 
a binomial expansion as with NMR and may be determined 
from Pascal’s triangle (Table 3.9). By analogy with Equa- 
tion (3.47), if the unpaired electron was associated with two 
nuclei (A and B), the number of peaks resulting would be a 
product of the splitting induced by each: 


Number of peaks = (2-n- I + 1)a x (2-m-I+1)p 
(3.50) 


where n and m refer to the number of each nucleus and 74 and 
Tp are their respective spins (Table 3.6). Thus, if an unpaired 
electron was associated with an N-H group, six peaks would 
be expected [that is (2 x 1 x 1/24+1)x@x1x1l4+)D= 
6]. In practice, these six peaks would be organized as three 
doublets. As with NMR, the strength of coupling is related 
to the spacing between peaks which is called the isotropic 
hyperfine coupling constant, a. The larger the value of a, 
the greater the coupling and the larger the spacing between 
split peaks. Because of the limited number of combinations 
of nuclei found in biological ESR, peak splitting also helps 
unambiguously to identify the nature of the chemical group 
containing the unpaired electron if this is not previously 
known. 

ESR spectra are measured in an ESR spectrometer which 
is generally similar in design to an NMR spectrometer (Fig- 
ure 3.52). The main differences are that ESR spectrometers 
expose samples to microwave rather than radiowave radia- 
tion and produce derivative rather than absorbance spectra 
(see below). These differences mean that distinct spectrom- 
eters are required for each magnetic resonance technique. 

Despite the many similarities, ESR also shows several 
points of difference to NMR spectroscopy. Largely for in- 
strumental reasons, ESR spectra are conventionally mea- 
sured as the first derivative (dA) of the absorbance (A) as 
a function of frequency (v). The relationship between A 
and dA is shown diagrammatically in Figure 3.53. ‘Peaks’ 
in ESR spectra are therefore referred to as lines consist- 
ing of a peak and trough. ESR spectra are also consider- 
ably more simple and sensitive than NMR spectra. Species 


(a) 
A 
22 Absorption 
(b) Maximum 
First 
p derivative 
dA 


E Minimum 
dik a Aorv 
oo 
Figure 3.53. ESR spectra are recorded as first derivatives of ab- 
sorption. (a) An absorbance spectrum is a plot of A versus à or v. 
(b) A first derivative plot of (a). This is a plot of rate of change of A. 
Note that the maximum and minimum points of this plot correspond 
to the half-way points of (a). 


containing unpaired electrons are quite rare in biology which 
means that much fewer peaks are found in an ESR spec- 
trum of a biological sample than in an NMR spectrum of 
the same sample. Good examples of paramagnetic species 
include molecular oxygen (O2), free radical intermediates 
formed during enzyme catalysis and metals forming part of 
metalloproteins. ESR spectra are measurable at micromolar 
concentrations while NMR spectra are usually obtained in 
the millimolar range. In these respects, the relationship of 
ESR to NMR is reminiscent of that between absorbance and 
fluorescence spectroscopy in that fluors are rarer than chro- 
mophores in biomolecules and are detectable at much lower 
concentrations. 


3.8.3 Uses of ESR Spectroscopy in Biochemistry 


ESR is a particularly useful technique in biological situa- 
tions involving unpaired electrons. Any process either in- 
volving or producing such species may be conveniently fol- 
lowed by ESR. For this reason ESR has been widely used 
in studies of the production of free radicals and in biologi- 
cal processes involving transition metals. Good examples of 
these investigations include studies of the electron transport 
chain in mitochondria, kinetic mechanisms of metalloen- 
zymes and the process of photosynthesis in plant cells. 

As well as studying naturally-occurring paramagnetic 
samples, it is possible to introduce nonbiological param- 
agnetic species into biomolecules. These spin probes are 
stable molecules which naturally contain unpaired electrons 
(Figure 3.54). If they can be attached at a unique site (e.g. in 
a protein or a membrane), they can act as reporter molecules. 
Unpaired electrons are especially sensitive to their freedom 
of movement. The more restricted the movement of a para- 
magnetic centre, the broader the ESR lines corresponding to 
this centre (Figure 3.55). Experimentally, movement can be 
restricted by crystallization (Chapter 6), lowering tempera- 
ture, increased viscosity (e.g. by adding glycerol; Chapter 
7) and by inclusion of the sample in a phospholipid bilayer. 

One of the most useful applications of ESR is therefore in 
identifying and quantifying dynamic mobility of molecules 
such as phospholipids, peptides, proteins and drugs in bio- 
logical systems, especially in membranes. Changes in mo- 
bility, for example due to binding of a protein at the cell sur- 
face, can be detected as a result of effects on ESR spectra. 
ESR also provides an elegant means of identifying trans- 
membrane domains in membrane proteins by site-directed 
spin labelling (Example 3.7). Cysteine residues may be used 
to replace individual amino acids in any protein by site- 
directed mutagenesis. Cysteine-specific spin probes (Figure 
3.54) can be attached at this cysteine in each mutant and 
the mutant proteins can then be incorporated in a phos- 
pholipid bilayer. A clear band-broadening effect is seen in 
ESR spectra when the mutated residue becomes part of the 
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Figure 3.54. Spin labels. These are nonbiological molecules which 
contain an unpaired electron. TEMPO is one of the most popular 
of these and the 4-isothiocyanate derivative is particularly suitable 
for labelling proteins at the N-terminus (Figure 3.55). 5-Nitroxyl 
oxazolidine may be used to label fatty acid chains in, for exam- 


ple, phospholipids, while a methanethiosulphonate derivative of 
OTMPM is specific for labelling -SH groups of cysteine. 


transmembrane domain. Such experiments can provide im- 
portant confirmation of predictions obtained from computer 
algorithms. 


3.9 LASERS 


In order to maximize signal : noise ratios and thus instrumen- 
tal sensitivity, many manufacturers now use beams of narrow 
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(a) 


(b) 
Bilayer 
45°C (liquid) 
(c) 
20°C Bilayer 
(gel) 


Figure 3.55. Effects of immobilization on ESR spectra. A pro- 
tein was spin-labelled with 4-isothiocyanato-TEMPO (Figure 3.54). 
ESR spectra shown are for (a) free spin label, (b) spin-labelled pro- 
tein in a phospholipid bilayer at 45°C and (c) spin-labelled protein 
in a phospholipid bilayer at 20°C. The phospholipids in the bilayer 
undergo a gel-liquid transition at 41°C. Note how much broader 
the lines are as degree of immobilization increases. 


and intense light called laser beams in certain items of labo- 
ratory equipment. Good examples of this include automated 
raman spectrometers (Section 3.6.5), surface plasmon reso- 
nance (Section 3.10), MALDI mass spectrometers (Section 
4.1.3), DNA sequencers and image analyzers (Chapter 5). 
Laser means light amplification by stimulated emission of 
radiation. There follows a brief description of the origin and 
uses of laser beams. 


3.9.1 Origin of Laser Beams 


We have already seen that when atoms undergo an electronic 
transition from a higher to a lower energy level they emit 
light (Section 3.2.2). In a light-source such as a star or a light- 
bulb, light is emitted randomly from many thousands of 
individual atoms and has no particular direction. Moreover, 
there is no direct phase relationship between the light from 
any one atom and that from another (Figure 3.44). This light 
is said to be noncoherent. Coherent light, on the other hand, 
is light where the beam from every atom is both in phase and 


in the same direction. Lasers depend on the production of 
coherent light from a material arising from a process known 
as stimulated emission. 

Under normal conditions, only a small number of a pop- 
ulation of atoms exists in an excited state. By a process 
known as pumping, it is possible to achieve an inversion of 
this situation, called a population inversion, where a major- 
ity of the atoms in the population exist in a partially stable 
excited state. Depending on the characteristics of the active 
material, a number of pumping mechanisms are possible, 
but for crystalline and liquid materials optical pumping is 
one of the most important (Figure 3.56). If a photon of a 
given frequency enters this active material, an atom may 
return to the ground state by emitting a second photon of the 
same frequency, phase and direction as the incident photon 
(Figure 3.57). This pair of photons can stimulate emission 
from two further excited atoms resulting in four photons 
which can, in turn, stimulate emission of eight photons, and 
so on. In a form of chain reaction, each atom returning to the 
ground state by a process of constructive interference (Fig- 
ure 3.44) adds to the amplitude of a unidirectional beam. 
This amplitude is proportional to the number of atoms in 
the population. Since there might be 10!° atoms in even 
a small amount of active material, an intensely strong and 
fine beam of coherent light results from this process (Fig- 
ure 3.58). Reflection of the beam back and forth through 
the active material between two mirrors increases further 
the output of power. One of the mirrors is half-silvered thus 
reflecting half of the photons back into the active material 
to impact on any atoms missed during the first passage. 

Of course, not every material is capable of sustaining 
a population inversion. The first laser was developed by 
Ted Maiman in 1960 using a single crystal of ruby which 
consists of sapphire (Al,0O3) grown with a small amount 
of chromium. Since then, a wide variety of gases, liquids 
and solids have been used as active materials producing 
laser beams covering a wide range of energies. Because the 
active materials in these laser sources have several energy 
levels, photons of a variety of frequencies can result from the 
stimulated emission process. Incorporation of a prism into 
the design facilitates selection of single wavelengths and 
lasers of this type are called tunable lasers (Figure 3.59). 


3.9.2 Some Uses of Laser Beams 


Laser beams are useful in any situation where an intense 
and narrow beam of light of defined energy is required. 
They have found many applications in electronics (e.g. in 
fibre optics) and in medicine (e.g. in eye surgery) and some 
of the items of modern equipment found in life science 
laboratories which incorporate lasers described elsewhere 
in this text have been mentioned. Lasers make possible a 
number of novel high-resolution approaches to the study of 
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Example 3.7 Site-directed spin labeling of bacteriorhodopsin 


Bacteriorhodopsin is a light-driven proton pump consisting of seven transmembrane a-helices. Although the gross 
structure of this protein is known, the detailed orientation of the helices and the exact location of interhelical loops are 
unclear. One possible way to study a problem of this type would be to classify individual amino acid side-chains as being 
(1) buried in the interior of the protein, (2) exposed on the protein surface but in the interior of the membrane or (3) 
exposed on the protein surface in contact with the extracellular solution. 


Membrane 
protein 


Biological 
membrane 


Site-directed spin labelling involves replacing each residue in the region of interest, in turn, with cysteine by site- 
directed mutagenesis thus generating a family of mutant proteins. Each variant of the protein is then individually treated 
with a cysteine-specific spin label such as OTMPM (Figure 3.54). This facilitates measurement of an ESR signal for each 
amino acid side-chain in turn. The ESR signal can be affected by spin relaxing agents, which enhance the relaxation rate 
of the spin label and this type of experiment is known as spin-label relaximetry. 
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A variety of spin relaxing agents are available with individual solubility in aqueous solution and in the lipid components 
of biological membranes. Exposure of a specifically spin labelled mutant to spin relaxing agents can be quantified by 


(Continues)_| 
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M (Continued) 


an accessibility parameter, A P;;2. This value is high in situations where the spin-label is accessible and low where it is 
not. In this example, molecular oxygen (O2) is an effective spin relaxing agent capable of dissolving both in aqueous 
solution and in biological membranes. Chromium oxalate, on the other hand, is only soluble in aqueous solution. A 
spin-labelled mutant which is relaxed by both of these relaxing agents is likely to be on the surface of the protein in 
touch with the extracellular surface (i.e. 3 above) while a mutant only affected by O3 is likely to be on the surface of the 
protein but surrounded by the biological membrane (i.e. 2 above). Spin-labelled mutants not affected by either relaxing 
agent represent residues likely to be buried in the interior of the protein (i.e. 1 above). 

A series of X — Cys mutants in the sequence region 125-142 were generated and spin labelled with OTMPM. 
Independent methods were used to confirm that these mutant proteins had essentially identical structure to the wild- 
type. The mutants were then individually exposed to molecular Oxygen and to chromium oxalate and the accessibility 
parameter determined. The results are as follows. 

This experiment allowed identification of residues 129-131 as being an interhelical region exposed to the cell surface 
and thus sensitive to chromium oxalate. Treatment with O% revealed a regular pattern in the region13 1-138 of exposure 
to lipid followed by exposure to protein with a periodicity of 3.6 residues (i.e. an a-helix). This is an approach which is 
generally applicable to membrane-bound proteins including receptors and ion-channels which can also be used to follow 
three-dimensional structural perturbations. 


(From Millhauser, 1992) Fig 2 from Milhauser, G.L., selective placement of electron resonance spin labels: New structural methods for peptides and 


proteins. Trends in Biochemical Science 17, 448-452. 


biomolecules which are either not feasible or else not as 
sensitive using noncoherent light sources. 

One of the most important applications of lasers is in 
confocal microscopy which makes possible the visualization 
of macromolecular complexes and immuno-labeled proteins 
in cells. A laser beam is focused on a particular point in a 
cell or tissue sample with very high precision. An image of 
a plane through the sample can be constructed by scanning 
the sample in a process called laser scanning. The ability 
to focus a fine laser beam on individual cells also makes 
it possible to excite either natural fluors or extrinsic fluors 
attached to sites in or on the cell. This technique is known as 
laser-induced fluorescence and it can be used, for example, 


Excitation Population Stimulated 
inversion emission 
l 2 3 


Figure 3.56. Population inversion by optical pumping. The atoms 
of the active material have two excited electronic energy levels. 
(1) The electron is promoted to the higher energy level by a flash 
of light. (2) It can spontaneously drop to an intermediate energy 
level. In this energy level the atom is stable for some time allowing 
build-up of a large number of atoms at this energy, i.e. population 
inversion. (3) Impact of a photon causes stimulated emission with 
a return to the ground state (Figure 3.57). 


to detect tumours, dermal lesions or atherosclerotic plaques 
in human tissue samples. 

Recent developments in the application of lasers to bio- 
chemistry have even made possible the illumination and 
study of single molecules. Microscopic plastic beads can be 
trapped in tightly-focused laser beams called optical traps. 
If these beads are attached to a single biomolecule (e.g. a 
DNA molecule or protein), the optical trap can be used as an 
‘optical tweezers’ to manipulate the single molecule. This 
makes possible the study of the mechanics, fluorescence 
and other properties of single molecules under conditions 
of very low signal: noise ratio since the information ob- 
tained is not averaged over a whole population of molecules. 
This experimental approach has enormous potential for the 
study of motion and kinetics of biomolecules crucial to cell 
processes. Special interest has focused on DNA-protein in- 
teractions and molecular ‘motors’ such as myosin, kinesin 
and F,-ATPase. Insights which have been obtained from 
this type of study include the finding that RNA polymerase 
can ‘pull’ DNA during transcription with a force which is 


5 
Photon TE 
-E = 2 photons 4 photons... 


Figure 3.57. Principle of the laser. A photon of light returns the 
electron to the ground state by stimulated emission. This releases 
two photons of the same direction and phase. These impact on two 
atoms resulting in stimulated emission of four photons, and so on. 
Ultimately, this results in a fine beam of coherent light. 
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Half-silvered 


Light source 


Figure 3.58. Outline design of a laser device. Active material is surrounded by a light source (e.g. a flashlamp). This emits an intense flash 
of light to initiate the optical pumping process. Light emitted by stimulated emission is passed back and forth through the cylinder of active 
material by reflection between a silvered and half-silvered mirror (arrows). Half of these photons pass through the half-silvered mirror and 


are emitted from the laser source as an intense laser beam. 


four times as strong as that exerted by myosin during muscle 
contraction. Combination of this approach with fluorescence 
resonance energy transfer (Section 3.4.5) is likely to make 
possible the measurement of distances in the range 5—9 nm. 

Another laser-based method allowing visualization of 
atomic-level resolution in single biomolecules and at mem- 
brane surfaces is atomic force microscopy (AFM) which 
is also sometimes called scanning probe microscopy. This 
technique (Figure 3.60; Example 3.8) brings a cantilever tip 
physically close to the surface to be imaged. The apex of this 
tip is very tiny (of the order of a few nm to tens of nm in ra- 
dius) giving the technique its nanoscale resolution. An ionic 
repulsive force is exerted upwards on the tip which bends 
the cantilever upwards. The amount of bending is measured 
by reflection of a laser spot on to a detector. This technique 
allows exquisitely sensitive imaging of nanosurfaces includ- 
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ing biological membranes. By attaching ligands or proteins 
to the tip, protein-ligand and protein-protein interactions can 
be sensitively determined under physiological conditions. 


3.10 SURFACE PLASMON RESONANCE 


Surface plasmon resonance (SPR) is an optical technique 
which depends on changes in refractive index near metal 
surfaces. When two surfaces, one a metal and the other a 
dielectric material are exposed to a beam of plane-polarized 
light of wavelength, À, a longitudinal charge density wave (a 
surface plasmon) is propagated along the interface between 
them (Figure 3.61). This only happens when one of the sur- 
faces is a metal and works best with silver, gold, copper and 
aluminium. This is because metals contain free oscillating 
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Figure 3.59. Outline design of a tunable laser. The mirror at one end of the tube is transparent allowing light to impact on a silvered prism. 
This diffracts the light into individual wavelengths (e.g. A; — 44). By varying the angle of the prism, any value for à may be selected for 
reflection back into the active material (in this example it is 4;). The laser beam resulting from this will consist only of light of the selected 


wavelength. 


104 PHYSICAL BIOCHEMISTRY 


Photodetector 


Figure 3.60. Atomic force microscopy. A very fine (nm scale) tip on a flexible cantilever is brought near to the object to be imaged. An 
upwards repulsive force bends the cantilever which is detected as movement of a reflected laser spot on a photodetectior. This movement 
allows measurement of the force experienced at the tip. Combining such scans allow construction of three-dimensional images. 


Example 3.8 Transfection of plasmid DNA into cells detected by atomic force microscopy 


The cantilever tip of an AFM probe is very tiny but can be functionalized by addition of ligands or proteins such as 
antibodies and streptavidin. In this experiment, the tips (10nm diameter) were decorated with plasmid DNA encoding 
an enhanced green fluorescent protein (EGFP) (see also Example 3.5). After rinsing, the tips were immediately brought 
near to cultured human embryonic kidney cells. The following images show A. the tip approaching the cell, B. the tip 
penetrating the cell and C. the undamaged tip being removed from the cell. The point of penetration is detected by a 
sharp inflection (*) in the force versus tip-cell distance graph. 


-_- 
2 
£ 
8 
(o 
Ma 


-1,500 -1,000 -500 0 500 1,000 1,500 
Tip-Cell Distance (nm) 


Success of the transfection was confirmed by expression of the EGFP. 


(Reproduced with permission from Ceurrier et al., 2007, Biochem. Biophys. Res. Commun. 355, 632-636). 
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Figure 3.61. Surface Plasmon resonance. Plane-polarized light (wavelength A) arriving at a thin layer (thickness <À) of metal between a 
more (glass) and less (liquid or air) optically dense material is reflected by total internal reflection. An evanescent wave enters the metal 


interface to a depth <À. At a particular angle of incidence, 6, the evanescent wave (vector Key) couples with free oscillating electrons 
(plasmons; vector Ksp) within the metal. Energy absorbed in this process is detected as a sharp reduction in intensity of the reflected light 


at a specific value for 0. Ksp depends strongly on the refractive index of the liquid or air immediately above the metal layer to a depth 


<300 nm. 


electrons called plasmons. When light traveling through an 
optically dense medium such as glass arrives at an inter- 
face with a lower optical density (e.g. liquid), it is reflected 
back into the more optically dense medium, a phenomenon 
called total internal reflectance. However, a component of 
the incident light, the evanescent wave, penetrates into the 
less dense medium to a distance smaller than à. In SPR, two 
optically dense surfaces are used (e.g. glass and liquid) one 
of which is coated with a layer of metal of thickness less 
than À. The value for the wave vector of the evanescent field, 
Key is given by: 


Lo i 
Key = — -ng -sin 
c 


(3.51) 


where po is the frequency of incident light, n, is the refrac- 
tive index of the glass, 6 is the angle of incidence (see Figure 
3.61) and c is the speed of light in vacuo. The wave vector 


of a surface plasmon, K,,, approximates to: 


ôm n? 
2 _ 20 ômini 
P c ôm +n? 
m s 


(3.52) 
where ôm is the dielectric constant of the metal film and 
n; is the refractive index of the dielectric medium, in this 
case liquid. The evanescent wave can couple by resonance 
(hence surface plasmon resonance) with plasmons in the 
metal when Key = Ksp. This happens only at specific values 
of 0. Some energy is lost from the incident light as a result 


which causes a reduction in the intensity of the reflected 
light which can be detected. 

Inspection of Equation (3.52) shows that Ksp is highly 
dependent on the refractive index of the medium above the 
metal film. This holds up to a distance of approximately 
300nm above the metal. Any process altering n, can be 
sensitively detected by SPR so the technique has found ap- 
plications in the study of kinetics and thermodynamics of 
binding processes (e.g. protein-ligand, protein-protein). Be- 
cause SPR only reports on events <300 nm above the metal 
layer, it gives meaningful measurements of binding pro- 
cesses in real time, over a wide dynamic range with direct 
detection and no requirement for labels or special optical 
characteristics in the samples studied. 


3.10.1 Equipment Used in SPR 


Traditional SPR instruments use plane-polarized light of a 
single wavelength, à, and measure the sharp decrease in 
intensity of the reflected beam (Figure 3.62). This happens 
at a particular value of 6, and can vary also with à. This 
design also allows variation in flow rates and temperature. 
The decrease in intensity of the reflected beam is measured 
as decreased reflectivity by a two-dimensional detector ar- 
ray. The surface of the sensor may be coated with a protein, 
a lipid bilayer or a spin-coated polymer. A typical binding 
experiment is shown in Figure 3.63. The sensor surface is 
first coated with a suitable buffer which allows the SPR in- 
strument to establish a baseline. Protein is then added to 
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Figure 3.62. A typical SPR instrument. A convergent beam of plane-polarized light (wavelength, A) is focused through a glass prism onto a 
silver/gold-coated glass slide and reflected to a detector. Solutions can pass through a flow-cell at defined values of flow-rate and temperature. 
Protein flowing through the flow-cell can alter the surface immediately above the metal layer. 
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Figure 3.63. Schematic of SPR protein binding experiment. The sensor is equilibrated with buffer before addition of a protein solution. 
This changes ns immediately above the sensor surface giving a sharp decrease in reflectivity. Protein binding is followed as a change in SPR 
angle (0) over time. When the surface becomes saturated, the SPR angle reaches a maximum. Loosely bound protein is removed by washing 


with buffer. Extent of adsorption is given by the difference between initial and final SPR angles while the rate of binding may be measured 
from the steepest part of the positive slope. 
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Figure 3.64. SPR imaging apparatus. A plane polarized laser light source which provides monochromatic coherent light to a metal-coated 
glass slide. Reflected light is collected on a charge-coupled device camera and recorded. A correction is made for background to yield a 


“clean image”. 


the sensor surface causing a change in n, and a sharp de- 
crease in reflectivity. When protein saturates the surface, n, 
again changes with a second decrease in reflectivity. Buffer 
is then added to the flow cell which removes loosely-bound 
protein. The instrument converts reflectivity measurements 
into a real-time adsorption profile describing binding of the 
protein to the surface. The extent of adsorption is given by 
the difference between initial and final SPR angles while 
the rate of adsorption is measured from the steepest positive 
gradient of the adsorption curve. 

This experimental set-up has the disadvantage of lim- 
ited sample throughput. For this reason, there is now much 
interest in combining SPR with array and imaging tech- 
nologies to allow simultaneous measurement of thousands 
of biomolecular interactions. An SPR imaging apparatus 
consists of a plane-polarized laser light source which pro- 
vides monochromatic coherent light to a metal-coated glass 
slide (Figure 3.64). Reflected light is collected on a charge- 
coupled device camera and recorded. As the intensity of 
reflected light is linearly proportional to that of the incident 
light, a high-intensity laser is best as the light source. How- 
ever, it can be difficult in a protein array to achieve uniform 
illumination with a laser source which results in signal vari- 
ation requiring background correction. Therefore, the crude 
image initially obtained must undergo a background correc- 


tion with image analysis software to produce a clean image 
of the array. SPR imaging instruments can achieve a spatial 
resolution down to 10 um, giving the capability to resolve 
10.000 spots in a 1.4 cm? array (Figure 3.64). 


3.10.2 Use of SPR in Measurement 
of Adsorption Kinetics 


The experimental setup shown in Figure 3.62 allows intro- 
duction of components and measurement of their binding to 
the sensor surface (Example 3.9). The rate of adsorption of 
an analyte is determined from the steepest part of the adsorp- 
tion profile (Figure 3.63). This is dominated by two factors; 
the intrinsic kinetics of binding and mass transport through 
a boundary layer above the sensor. The process of diffusion 
to the sensor surface is generally much slower than the in- 
trinsic rate of adsorption and thus is rate-limiting. However, 
this rate varies with flow rate while the intrinsic kinetics of 
binding do not. The rate of diffusion, Jy, is given by: 


D 
Ja = —(Ap — As) (3.53) 
Xd 


where D is the diffusion coefficient, A, the bulk concentra- 
tion of analyte, A, the surface concentration of analyte and 
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Example 3.9 Surface plasmon resonance identifies a plasmin binding site in blood factor VIII 


A plasma protein called factor VII is essential for clotting of blood and is defective or deficient in people suffering from 
the genetic disease haemophilia A. The protein is synthesized as a large precursor consisting of six domains (A1-A2- 
B-A3-C1-C2). This is cleaved by a protease at the B-A3 junction generating heavy (A1-A2) and light (B-A3-C1-C2) 
chains which circulate in the blood as a complex associated with another protein called the von Willebrand factor. Factor 
VIII is converted into its active form, factor VIIa, by limited proteolytic cleavage by either of the proteases thrombin or 
Xa at Arg-372 and Arg-740 of the heavy chain to produce Al and A2 subunits. Cleavage at Arg-1689 of the light chain 
produces an A3-C1-C2 subunit. Active factor VIa results from cleavage at Arg-372 and Arg-1689. Plasmin and other 
proteases can inactivate factor VIII by cleavage at Arg-372. 

This study used SPR to identify the sites within domain A2 where plasmin interacts. Plasmin (7 ng/mm7) was 
immobilized on the sensor chip and various factor VIII components were added for 4 min. These included intact heavy 
chain, intact light chain, Al and A2 domains. After this, running buffer was changed over 2 min. The SPR data are shown 
below: 
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SPR association/dissociation curves of plasmin binding to factor VIII components. (a) Factor VII (1 = 12.5 nM, 2 
25 nM, 3 = 50nM), (b) Intact heavy chain (4 = 20 nM, 5 = 40nM, 6 = 80 nM), (c) Intact light chain (4 = 20nM, 5 
40 nM, 6 = 80 nM), (d) A2 subunit (7 = 40 nM, 8 = 80 nM, 9 = 120 nM), (e) Al subunit (7 = 40 nM, 8 = 80nM, 9 = 
120nM). 

Nonlinear regression analysis of these data yielded the following binding parameters. The Kas were also determined 
by an independent ELISA assay: 


(Continues)_| 
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M (Continued) 


Binding parameters for the interacton of factor VHI(a) subunit and Ah-plasmin 
determined in SPR-based assay and ELISA-based assay 


SPR-based assay ELISA 
Factor VIII fragment Kass Käiss Kye Ky 
x10*M-!s-! x 103 s7! nM nM 

Factor VII 26.3 + 0.8 0.8 + 0.1 3.1 6.7 + 2.0 
Heavy chain 40.2 + 6.9 2.3+0.1 5.6 13.0+3.1 
Light chain 2.9+0.4 2.0+0.7 68.2 115 11 
A2 14.3 + 6.2 3.24 1.3 22.6 40.7 + 8.7 
Al 5.2 + 0.4 10.9 0.9 208 N.d.? 


From this it was concluded that A2 is preferentially bound to plasmin compared to A1. Since a monoclonal antibody 
recognizing residues 484-509 abolished plasmin binding, mutant forms of A2 were generated in which Lys or Arg residues 
were changed in this region and a panel of 17 mutants and wild type were assayed by SPR as above. This revealed that 
Arg 484 Ala displayed ~2000-fold lower kass (250 M7! s~!) and ~250-fold increased Ky (5800 nM) compared to wild 
type confirming that residue 484 plays a key role in plasmin binding. 


(Reproduced with permission from Nogami et al. (2008) Biochimica et Biophysica Acta 1784, 753-763). 


xa the thickness of the boundary layer. We can define a mass 
transfer coefficient, km, as D/xa. Thus: 


Ja = kn(Av — As) (3.54) 
In general for a flow cell shown in Figure 3.62: 
ke are (3.55) 
Wes ia h-L : 


where v is the flow rate, L is the distance between the flow 
cell inlet and sampling area and A is the height of the flow 
cell. Initially the kinetics are dominated by mass transfer 
and are diffusion-controlled but, as adsorption proceeds, the 
process becomes limited by the intrinsic kinetics of binding. 
From Equations (3.54) and (3.55) we could predict that 
the mass transfer rate can be increased by reducing L or 
increasing flow rate, v, (which both would increase km) or 
by increasing the concentration gradient of the analyte (A, — 
Ag). 

When measuring binding of a ligand (B) to a ligate (A) 
such that A + B — AB, the overall rate of AB complex 
formation is: 


d[AB] 
dt 


= formation of AB — dissociation of AB (3.56) 


= K [A] - [B] — K,[AB] (3.57) 


where K, is the association constant, Kg the dissociation 
constant, [A] the total ligate concentration, [B] is the con- 
centration of free ligand and [AB] the concentration of the 
ligand-ligate complex. The maximum change in the SPR 
angle (Rmax; Figure 3.63) is dependant directly on the total 
ligand concentration ([B] + [AB]). The term (Rmax — R) 
is dependant on [B] and [A] and can be regarded as a con- 
stant, C, since it is constantly replaced under flow conditions 
(Figure 3.62). Based on these relationships we can rewrite 
Equation (3.57) as: 


dR 
— = Ka: C- (Rmx — R) — KaR 


FP (3.58) 


= K, C Rmax i (Ka -C + Ka) -R (3.59) 
A plot of dR/dt against R for a range of ligate concentrations 
can be used to determine rate constants (Figure 3.64). The 
slope of each such plot (ks) is then plotted against C to form 
a straight line where: 

—k,= Ka: C + Ka (3.60) 
K, is the slope of this plot and Ky the y-axis intercept. 


Alternatively, traces can be analysed by nonlinear regression 
using the software provided with the instrument. 
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Figure 3.65. Results of SPR experiment. (a) Reflectivity is plotted 
versus time. (b) dR/dt determined from (a) is plotted versus R for 
a range of ligate concentrations. (c) The slope of each such plot 
(kobs) is then plotted against C to form a straight line Ka is the 
slope of this plot and Kd the y-axis intercept. 


REFERENCES 


General reading 


Schmid, F.X. (1997) Optical spectroscopy to characterize protein 
conformational changes, in Protein Structure: A Practical Ap- 
proach (ed. T.E. Creighton), Oxford University Press, Oxford, 
UK. An overview of the use of fluorescence, absorbance and CD 
as measures of conformational change in proteins. 

Weiss, R.J. (1996) A Brief History of Light and Those That Lit The 
Way, World Scientific Publishing, Singapore. A very readable 
and lighthearted overview of the history of light. 

Wilson, R.G. (1995) Fourier Series and Optical Transform Tech- 
niques in Contemporary Optics: An Introduction, John Wiley & 
Sons, Inc., NY, USA. A mathematical description of the applica- 
tion of the Fourier Transform to optical problems. 


Absorbance spectroscopy 


Mach, H., Volkin, D.B., Burke, C.J. and Middaugh, C.R. (1995) Ul- 
traviolet absorption spectroscopy, in Protein Stability and Folding 
(ed. B.A. Shirley) Humana Press, Totowa, NJ, USA, pp. 91-114. 
A description of the use of absorption spectroscopy in protein 
folding studies. 

Pace, C.N. and Schmid, F.X. (1997) How to determine the molar 
absorbance coefficient of a protein, in Protein Structure: A Prac- 
tical Approach (ed. T.E. Creighton), Oxford University Press, 
Oxford, UK, pp. 253-60. An overview of common intrinsic pro- 
tein chromophores and their uses. 


Fluorescence and chemiluminescence spectroscopy 


Brand, L. and Johnson, M.L. (eds) (1997) Fluorescence Spec- 
troscopy (Methods in Enzymology), Vol. 278 (eds J.N. Abel- 
sonand and M.I. Simon), Academic Press, San Diego CA, USA. 
A comprehensive collection of examples of the applications of 
fluorescence in biochemistry. 

Cheung, H.C. (1995) Resonance energy transfer, in Topics in Flu- 
orescence Spectroscopy; Principles, Vol. 2 (ed. J.R. Lakowicz), 
Plenum Press, NY, USA, pp. 127-76. An overview of principles 
of resonance energy transfer. 

Cubitt, A.B., Heim, R., Adams, S.R. et al. (1995) Understanding, 
improving and using green fluorescent proteins. Trend in Bio- 
chemical Sciences, 20, 448-55. A review of the biochemistry 
and biotechnological applications of green fluorescent proteins. 

Darby, I.A. (ed.) (1999) In Situ Hybridisation Protocols, Humana 
Press, Totowa, NJ, USA. A collection of laboratory protocols for 
in situ hybridization. 

Eftink, M.R. (1991) Fluorescence quenching: Theory and ap- 
plications, in Topics in Fluorescence Spectroscopy; Princi- 
ples, Vol. 2 (ed. J.R. Lakowicz) Plenum Press, NY, USA, pp. 
53-126. A comprehensive overview of principles of fluorescence 
quenching. 

LaRossa, R.A. (ed.) (1998) Bioluminescence Methods and Proto- 
cols, Humana Press, Totowa, NJ, USA. A collection of applica- 
tions of chemiluminescence. 

Mátyus, L., Sjöllğsi, J. and Jenei, A. (2006) Steady state fluores- 
cence quenching applications for studying protein structure and 
dynamics. Journal of Photochemistry and Photobiology B, Biol- 
ogy, 83, 223-36. An excellent review of fluorescence quenching. 


Royer, C.A. (1995) Ultraviolet absorption spectroscopy, in Protein 
Stability and Folding (ed. B.A. Shirley), Humana Press, Totowa, 
NJ, USA, pp. 65-89. A description of the use of fluorescence 
spectroscopy in protein folding studies. 

Schuler, B. and Eaton, W.A. (2008) Protein folding studied by 
single-molecule FRET. Current Opinion in Structural Biology, 
18, 16-26. A review of use of FRET in protein folding studies. 

Sullivan, K.F. (ed.) (2008) Fluorescent proteins, in Methods in Cell 
Biology, Vol. 85, Elsevier. A special issue on fluorescent proteins 
and their uses especially in cell biology. 

Sun, C.X., Yang, J.H., Li, L. et al. (2004) Advances in the study of 
luminescence probes for proteins. Journal of Chromatography 
B, 803, 173-90. A review of spectral probes. 

Torrent, J., Alvarez-Martinez, M.T., Liautard, J.P. et al. (2005) The 
role of the 132-160 region in prion protein conformational tran- 
sitions. Protein Science, 14, 956-67. 

Wolfbeis, O.S. (ed.) (1993) Fluorescence Spectroscopy: New Meth- 
ods and Applications, Springer-Verlag, Berlin, Germany. An 
interesting collection of analytical applications of fluorescence 
spectroscopy. 


Circular and Linear Dichroism 


Capaldi, K.P., Ferguson, S.J. and Radford, S.E. (1999) The greek 
key protein Apo-pseudoazurin folds through an obligate on- 
pathway intermediate. J. Mol. Biol., 286, 1621-32. An interesting 
example of the use of CD in protein folding investigations. 

Eriksson, S., Kim, S.K., Kubista, M. and Norden, B. (1993) Binding 
of 4’,6-diamidino-2-phenylindole (DAPI) to AT regions of DNA: 
evidence for an allosteric conformational change. Biochemistry, 
32, 2987-98. An example of the application of LD to a study of 
ligand binding to DNA. 

Hanson, K.R. (1966) Applications of the sequence rule. I. Naming 
the paired ligands g, g at a tetrahedral atom Xggij. II. Naming 
the two faces of a trigonal atom Yghi. Journal of the American 
Chemical Society, 88, 2731-48. A description of the RS conven- 
tion for enantiomers. 

Johnson, C.R., Morin, P.E., Arrowsmith, C.H. and Freire, E. 
(1995) Thermodynamic analysis of the structural stability of the 
tetrameric oligomerization domain of p53 tumor suppressor. Bio- 
chemistry, 34, 5309-16. A good example of the application of 
CD to investigations of protein structure. 

Kelly, S.M., Jess, T.J. and Price, N.C. (2005) How to study pro- 
teins by circular dichroism. Biochimica et Biophysica Acta, 1751, 
119-39. A review of the application of CD to the study of pro- 
teins. 

Martin, S.R. and Schilstra, M.J. (2008) Circular dichroism and its 
application to the study of biomolecules, in Methods in Cell 
Biology, Vol. 84 (eds J.J. Correia and H.W. Detrich), pp. 263-93. 
A review of CD. 

Rodgers, A. and Norden, B. (1997) Circular Dichroism and Linear 
Dichroism, Oxford University Press. An excellent introduction 
to the theory and practice of CD and LD. 


Infrared spectroscopy 


Barth, A. and Zscherp, C. (2002) What vibrations tell us about 
proteins. Quarterly Review of Biophysics, 35, 369-430. A com- 
prehensive review of infrared and raman spectroscopy as applied 
to proteins. 


SPECTROSCOPIC TECHNIQUES 111 


Cooper, E.A. and Knutson, K. (1995) Fourier transform infrared 
spectroscopy investigations of protein structure, in Physical 
Methods to Characterize Pharmaceutical Proteins (eds J.N. Her- 
ron, et al.), Plenum Press, NY, pp. 101-43. A review of the use 
of FTIR spectroscopy in protein structure determination. 

Franck, P. Nabet, P. and Dousset, B. (1998) Applications of infrared 
spectroscopy to medical biology. Cell and Molecular Biology, 44, 
273-5. A brief overview of the potential of FTIR spectroscopy 
in biomedical research. 

Jackson, M. and Mantsch, H.H. (1995) The use and misuse of FTIR 
spectroscopy in the determination of protein structure. CRC Crit. 
Reviews in Biochemistry and Molecular Biology, 30, 95-120. A 
critical review of the potential of FTIR spectroscopy for protein 
structure determination. 

Larrabee, J.A. and Choi, S. (1993) Fourier transform infrared spec- 
troscopy. Advances in Enzymol., 226, 289-305. A review of the 
theory, instrumentation and some applications of FTIR spec- 
troscopy. 

Stuart, B.H. (2004) Infrared Spectroscopy: Fundamentals and Ap- 
plications, John Wiley & Sons, Ltd, Chichester, UK. An intro- 
duction to the fundamentals of IR spectroscopy. 

van Thor, J.J., Pierik, A.J., Nugteren-Roodzant, I. et al. (1998) Char- 
acterization of the photoconversion of green fluorescent protein 
with FTIR spectroscopy. Biochemistry, 37, 16915-21. A good 
example of the use of difference FTIR spectroscopy. 

Vu, D.M., Myers, J.K., Oas, T.G. and Dyer, R.B. (2004) Probing the 
folding and unfolding dynamics of secondary and tertiary struc- 
tures in a three-helix bundle protein. Biochemistry, 43, 3582-9. 
An application of infrared spectroscopy to measuring rates of 
protein folding/unfolding. 


Raman spectroscopy 


Wang, Y. and Van Wart, H.E. (1993) Raman and resonance ra- 
man spectroscopy. Advances in Enzymol., 226, 319-73. A re- 
view of the theory, instrumentation and applications of Raman 
spectroscopy. 

Zhu, F.J., Isaacs, N.W., Hecht, L. and Barron, L.D. (2005) Raman 
optical activity: A tool for protein structure analysis. Structure, 
13, 1409-19. A review of applications of Raman spectroscopy to 
proteins. 


NMR 


Bushong, S.C. (1996) Nuclear magnetic resonance spectroscopy, 
in Magnetic Resonance Imaging, Mosby, St. Louis, pp. 104-17. 
A non-mathematical description of the basics of NMR. 

Evans, J.N.S. (1995) Biomolecular NMR Spectroscopy, Oxford Uni- 
versity Press, NY. A comprehensive description of the physical 
basis of biological NMR. 

Garcia-Martin, M.L., Garcia-Espinosa, M.A., Ballasteros, P. et al. 
(2002) Hydrogen turnover and subcellular compartmentation of 
hepatic [2-C-13] glutamate and [3-C-13] aspartate as detected by 
C-13 NMR. Journal of Biological Chemistry, 277, 7799-807. A 
good example of in vivo NMR. 

Kernec, F., Le Tallec, N., Nadal, L. et al. (1996) Phosphocrea- 
tine synthesis by isolated rat skeletal muscle mitochondria is not 
dependent upon external ADP: A 3!P NMR study. Biochem. 
Biophys. Res. Commun., 225, 819-25. A good example of 


112 PHYSICAL BIOCHEMISTRY 


application of 3!P NMR to the biochemical investigation of intact 
mitochondria. 

Reid, D.G.. (ed.) (1997) Protein NMR Techniques, Humana Press, 
Totowa, NJ, USA. Some good examples of applications of single 
and multi-dimensional NMR to biochemistry. 

Ruan, R.R. and Chen, P.L. (1998) Nuclear magnetic resonance 
techniques, in Water in Foods and Biological Materials: A Nu- 
clear Magnetic Resonance Approach, Technomic, Lancaster, PA, 
pp. 1-50. An introduction to the basics of NMR. 

Teng, Q. (2005) Structural Biology: Practical NMR Applications, 
Springer, NY, USA. A description of applications of NMR to 
structural biology with an excellent introduction to basics of 
NMR. 

Watts, A. (1998) Solid-state NMR approaches for studying the 
interaction of peptides and proteins with membranes. Biochimica 
et Biophysica Acta, 1376, 297-318. A review of the potential 
of solid-state NMR in the study of membranes with particular 
reference to interactions with proteins and peptides. 


ESR 


Cruz, A., Marsh, D. and Perez-Gil, J. (1998) Rotational dynamics 
of spin-labelled surfactant-associated proteins SP-B and SP-C 
in dipalmitoylphosphatidylcholine and dipalmitoylphosphatidy]- 
choline bilayers. Biochimica et Biophysica Acta, 1415, 
125-34. A good example of ESR studies in bilayer model 
systems. 

Edwards, T.E. and Sigurdsson, S.T. (2007) Site-specific incorpo- 
ration of nitroxide spin labels into 2’ position of nucleic acids. 
Nature Protocols, 2, 1954-62. A method for labeling nucleic 
acids with nitroxide spin labels. 

Hubbell, W.L., Cafiso, D.S. and Altenbach, C. (2000) Identifying 
conformation changes with site-directed spin labeling. Nature 
Structural Biology, 7, 735-9. A review of use of site-directed 
spin labeling in protein studies. 

Klug, C.S. and Feix, J.B. (2008) Methods and applications of site- 
specific spin labeling EPR spectroscopy, in Methods in Cell Bi- 
ology, Vol. 84, Elsevier, pp. 617-58. A review of site-directed 
spin labeling. 

Marsh, D. and Pali, T. (2004) The protein-lipid interface: Perspec- 
tives from magnetic resonance and crystal structures. Biochimica 
et Biophysica Acta, 1666, 118-41. An excellent review of use of 
ESR in studies of membrane proteins. 


Millhauser, G.L. (1992) Selective placement of electron resonance 
spin labels: new structural methods for peptides and proteins. 
Trends in Biochemical Sciences, 17, 448-52. A description of 
potential of ESR spin labelling. 

Russell, C.J., Thorgeirsson, T.E. and Shin, Y.-K. (1999) The mem- 
brane affinities of the aliphatic amino acid side chains in an a- 
helical context are independent of membrane immersion depth. 
Biochemistry, 38, 337-46. An elegant example of application of 
ESR in studies of the interactions between proteins and mem- 
branes. 


Lasers 


Ceurrier, C.M., Lebel, R. and Grandbois, M. (2007) Single cell 
transfection using plasmid decorated AFM probes. Biochemical 
and Biophysical research Communications, 355, 632-6. A fasci- 
nating example of a novel use of AFM. 

Ishijima, A. and Yanagida, T. (2001) Single molecule nanobio- 
science. Trends in Biochemical Sciences, 26, 438-44. A review 
of the potential of single molecule studies. 

Mehta, A.D., Rief, M., Spudich, J.A. et al. (1999) Single-molecule 
biomechanics with optical methods. Science, 283, 1689-95. A 
review of some applications of the single-molecule approach to 
the study of biomechanics. 

Meschede, D. (2007) Optics, Light and Lasers: The Practical Ap- 
proach to Modern Aspects of Photonics and Laser Physics, 
2nd edn, Wiley-VCH Verlag GmbH, Weinheim, Germany. An 
overview of the physics and applications of lasers. 

Moerner, W.E. and Orritt, M. (1999) Illuminating single molecules 
in condensed matter. Science, 283, 1670-6. A review of the prob- 
lems encountered in single-molecule studies. 


SPR 


Boozer, C., Kim, G., Cong, S. et al. (2006) Looking towards label- 
free biomolecular interaction analysis in a high-throughput for- 
mat: A review of new surface plasmon technologies. Current 
Opinion in Biotechnology, 17, 400-5. A review of developments 
in SPR. 

Nogami, K., Katsumi, N., Evgueni, L.S. et al. (2008) Identification 
of a plasmin interactive site within the A2 domain of the factor 
VIII heavy chain. Biochimica et Biophysica Acta, 1784, 753-63. 
An excellent illustration of the power of SPR especially when 
combined with other techniques. 


Chapter 4 


Mass Spectrometry 


Objectives 


¢ The physical basis of mass spectrometry (MS). 
° The range of MS methods available. 


After completing this chapter you should be familiar with: 


* How MS techniques can be interfaced with other high-resolution techniques. 
e How MS techniques can be applied to the study of biomolecules such as proteins, peptides and DNA. 


A wide range of molecular mass is found in living cells vary- 
ing from very tiny ionic species such as Na* to molecules 
made up of many thousands of atoms such as proteins and 
DNA. These molecules possess a variety of mass/charge 
(m/z) ratios, depending on their mass, m, and their charge 
(positive or negative), z. If we pass such a mixture through 
a powerful magnetic field, its components will be deflected 
differently based on this ratio, giving a means of making very 
accurate mass measurements. A variety of ionization meth- 
ods are available to impart charge and momentum to sample 
molecules in the gas phase, which may then pass through a 
magnetic field and be differentially deflected. These meth- 
ods vary in the energy they give to the sample molecules 
and in the degree to which they may cause fragmentation 
of their chemical structure. Depending on the ionization 
method chosen, it is therefore possible to determine a highly 
accurate mass for an intact chemical species or, alternatively, 
to fragment it into smaller pieces. The pattern of fragments 
observed in the latter experiment may be used to deduce the 
complete structure of the original molecule. 

The technique of mass spectrometry (MS) has long been 
used in the study of naturally volatile molecules such 
as lipids. In cases of non-volatile samples, it was possi- 
ble to derivatize them (e.g. acylation or esterification) to 
make them amenable to analysis. Because of limitations 
imposed by ionization, MS has in the past therefore had 
quite limited applications to the study of large, complex 
biomolecules such as proteins. However, the development 
of novel MS ionization techniques has now made possi- 
ble high-resolution mass determinations of large proteins 
(>200 kDa). This provides a particularly useful method for 
analysis of structural variants of proteins (e.g. mutants, post- 
translationally-modified proteins) at resolutions of a few 
atomic mass units (AMU). This chapter will describe the 
physical basis and practice of MS in the study of complex 


biopolymers in biochemistry and show how we can use this 
group of techniques to obtain high-resolution structural in- 
formation about such biopolymers. For more details on use 
of MS in proteomics, the reader is referred to Chapter 10. 


4.1 PRINCIPLES OF MASS 
SPECTROMETRY 


4.1.1 Physical Basis 


Any moving charged species of mass, m, and velocity, v, will 
be deflected by an applied magnetic field (Figure 4.1). The 
magnitude of this deflection will depend on the momentum, 
u, of the species which is given by Equation (4.1). 


H=m-v (4.1) 


Species with large momentum are deflected less than 
those with small momentum. Thus, if a stream of atoms 
and small molecules of identical velocity and charge but 
different mass in the gas phase is passed through a magnetic 
field, the deflection experienced by each atom or molecule 
depends on its mass. This deflection can therefore provide an 
accurate measure of mass. Larger molecules are uncharged 
in the gas phase so, in order to deflect these, it is neces- 
sary to confer a charge upon them. This may be achieved 
by irradiating the molecules with a beam of electrons and 
is called electron impact (ED) ionization (Figure 4.2). Since 
the beam of electrons is of quite high energy, this ionization 
method can cause extensive breakdown of molecular struc- 
ture, splitting a single molecule into a number of fragment 
ions of different mass (Figure 4.3) each of which may be 
deflected differently in the magnetic field. This generates a 
mass spectrum (MS) which is characteristic of the chemical 
structure of the molecule. In fact, the complete structure of 
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Figure 4.1. Differential deflection of charged species in magnetic field. Two species, of identical charge and velocity, will be deflected 
differently by a magnetic field. The species with the smallest momentum, u, is deflected greatest. 
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Figure 4.2. Electron Impact ionization. Sample is bombarded with a high-energy beam of electrons which causes ionization. These ions 
are subsequently accelerated and deflected in the analyzer. 
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Figure 4.3. Ionization of lactic acid by electron impact. Lactic 
acid is ionized by electrons forming a molecular ion. This breaks 
down into a large number of fragment ions, each possessing a 
characteristic mass allowing their identification. The pattern of ions 
generated is characteristic of the chemical structure of the parent 
molecule and can therefore be used to calculate this structure 


small biomolecules may be determined by electron impact 
MS (EI/MS) analysis alone. 

For many years, EI/MS was used in determination of 
structures of relatively low mass species but was not directly 
applicable to larger biomolecules such as proteins and pep- 
tides. The principal reason for this was the inadequacy of EI 
for the generation of ions of sufficiently large mass which 
could be introduced to the gas phase. However, development 
of novel ionization methods combined with other technical 
developments has now made MS analysis of large proteins 
possible. 


4.1.2 Overview of MS Experiment 


A mass spectrometer consists of three distinct components 
(Figure 4.4; Section 4.1.4); a source, analyzer and detector 
all maintained under a powerful vacuum (~1 mPa). The 
source is an ionization chamber where the stream of ions 
is generated. In this overview of MS we will concentrate 
on EI/MS but other ionization methods will be described in 
the following section. It is the possibilities offered by these 
alternative ionization methods which have facilitated MS 
analysis of macromolecules such as proteins and peptides. 
The analyzer maintains either a magnetic or electric field 
which accelerates the stream of ions to a single velocity and 
then differentially deflects them so that they can be detected. 
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Figure 4.4. Overview of MS experiment. A beam of charged ions 
is generated in the source which is accelerated and deflected in the 
analyzer. Ions are detected by the detector. The system is maintained 
under a powerful vacuum. 


Detector 


This separates ions based on differences in their m/z ratio. 
Because the experiment takes place in a powerful vacuum, 
ions are encouraged to exist in the gas phase and are unlikely 
to encounter air molecules. 

It is possible to analyse both positive and negative ions in 
MS systems, depending on the format of the analyzer (N.B. 
cations and anions must be separately analysed). These for- 
mats are called positive ion mode and negative ion mode, 
respectively. In EI/MS, a beam of electrons bombards the 
sample which has diffused into the ionization chamber. This 
efficiently strips the sample molecules of single electrons, 
converting them into positive ions which are called molec- 
ular ions. Since in chemical bonds electrons exist in pairs 
rather than singly, loss of a single electron leaves an unpaired 
electron behind. These ions are therefore also radicals and 
are chemically unstable. The excess energy imparted by 
bombardment with the electron beam leads to fragmentation 
of the molecular ions into fragment ions. This fragmentation 
produces an uncharged free radical and a cation (see Exam- 
ple 4.1). Every possible fragment ion can exist both in the 
form of an uncharged free radical and of a cation (the ratio 
between how much ion and radical are formed depends on 
fragment thermodynamic stability and will accordingly vary 
from fragment to fragment). The molecular ion breaks up 
into nearly every possible size of fragment down to individ- 
ual atoms. Since only ions can be accelerated in the analyzer, 
free radicals play no further part in the MS experiment. 

An alternative, though less common type of EI ionization 
is electron capture ionization, where the sample captures 
an electron from the beam to form an anion as shown in 
Figure 4.5. 

Provided all ions carry the same charge, they will deflect 
in the analyzer according to their m/z ratio that is according 
to their different mass. A spectrum of differing mass can 
therefore be generated from a single chemical species as 
shown in Example 4.1. It is possible to generate ions with 
z > 1, thus allowing analysis of large mass species. 

The ion beam may be detected by a photomultiplier 
detector and converted into an electric signal. This signal 
is amplified by a factor of up to 10° resulting in extremely 
sensitive detection. The spectrum of m/z ratios measured is 
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Example 4.1 Electron impact mass spectrometry of 2-hydroxy-5-nitrobenzy] alcohol (HNB) 


This compound was purified by reversed phase HPLC from commercial dimethyl (2-hydroxy]-5-nitrobenzyl)sulphonium 
bromide (HNBB). On EI/MS the following spectrum was obtained. Suggested allocations of structure to the principal 
ions are as follows: 
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M (Continued) 


NO, 


expected product of hydrolysis of HNBB. 


From this, it was concluded that the species with m/z of 169 represented the parent ion in this analysis and that the 
parent compound, HNB, had the following chemical structure: 


This was further supported by 'H NMR (Section 3.6.4), by elemental analysis and by the fact that this would be an 


OH 


CH,OH 


plotted against intensity normalized to that of the base peak 
(i.e. the peak in the spectrum with the highest intensity). All 
peaks in the spectrum are therefore in the relative intensity 
range 0-100 with the base peak at a value of 100 (Example 
4.1). Peaks found in the spectrum in addition to the base 
peak include the molecular ion (i.e. the intact molecule of 
interest), fragment ions (due to breakdown of the parent ion), 
isotope ions (see below) and background ions. The latter are 
usually due to air-contamination, oil leaks or contaminants 
of the sample. 

Several MS scans are carried out on a sample to yield a 
statistically-averaged spectrum. Especially when interfaced 
with other high-resolution techniques (Section 4.3), many 
spectra may be generated from a single sample. A large 
amount of data can therefore be generated in MS analysis 
which can be difficult to interpret manually. Modern EI-MS 
mass spectrometers contain a library of mass spectra of thou- 
sands of molecules which may be automatically compared 
to experimental spectra. MS spectra derived from analysis 
of protein peptide maps may be automatically compared to 
protein sequence databases facilitating identification of par- 
ticular proteins in the sample. Automated comparisons such 
as these allow rapid identification of sample components 
and their fragments in MS spectra enabling highly-efficient 
throughput of samples. 

Isotope peaks arise from the fact that some atoms in the 
species may exist in a variety of isotopic forms giving a 
range of possible mass. We can calculate the percent proba- 
bility of occurrence of these satellite peaks from the percent 
occurrence of each isotope in nature (Table 4.1). 

For small molecules, the presence of isotopes leads to 
quite small mass differences between MS fragments differ- 
ing only in isotope content and these can safely be ignored 
for most purposes. However, for large macromolecules con- 
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Figure 4.5. Electron capture ionization. 


taining up to thousands of C, H and N atoms, the ef- 
fect on calculated mass can be large. Moreover, the use 
of a simple atomic weight scale where C=12, H=1 and 
O=16 rather than the accurate monoisotopic atomic weight 
(Table 4.1) increasingly leads to inaccuracy in nominal 
molecular weight as molecular size increases. For example, 
a small protein with the atomic formula C150H3100200N160 
Ss might have a nominal molecular weight of 7.742 kDa 
with a simple atomic weight scale but the monoisotopic 
mass would be 7.744 kDa when estimated with the accurate 
atomic mass scale given in Table 4.1. If we allow for the 
presence of naturally-occurring isotopes we could calculate 
an isotopically-averaged value of 7.747 kDa. 


Table 4.1. Mass and abundance of isotopes found in biomolecules 


Isotopic mass Abundance 

Element Isotope (AMU) (%) 

H 1A 1.00784 99.985 
2H 2.01410 0.015 

C 12C 12.00000 98.90 
BC 13.00336 1.1 

N MN 14.00307 99.634 
ISN 15.00011 0.366 

(0) 160 15.99491 99.762 
170 16.99913 0.038 
180 17.99916 0.2 

P 31p 30.97376 100 

S 225 31.97207 95.02 
35 32.97146 0.75 
345 33.96787 4.21 
355 35.96708 0.08 

Cl 35C] 34.96885 75.77 
37C] 36.96590 24.23 

K 39K 38.96371 93.256 
K 39.96400 0.012 
“K 40.96183 6.73 
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Table 4.2. Summary of main MS ionization modes used in the study of biomolecules 


Ionization mode Basis of ionization 


Nature of sample 


Principal applications 


Electron impact Molecule loses or gains Vapour 
ŒD electron from beam of 
electrons in vacuo 
Chemical Beam of electrons passes Vapour 


ionization (CI) through large excess of 
ammonia or methane. 
Molecule gains H* to form 


quasimolecular ion, M + H 


Fast atom High-energy beam of argon or 
bombardment xenon causes 
(FAB) high-temperature spike 


which volatilizes and 
ionizes sample 

Beam of plasma ionizes and 
desorbs sample 


Plasma desorption 
ionization (PDI) 


Matrix-assisted High-energy laser (UV) causes 


plasma sample and matrix to 

desorption “sputter” into the gas phase 

ionization 

(MALDI) laser 
Electrospray A fine spray of charged 


ionization (EST) droplets is generated. 
Matrix is evaporated leading 


to expulsion of ions 


ESI in which a flow of 
nebulising gas is also used 


Ion spray 
ionization (ISI) 


4.1.3 Ionization Modes 


The description of EI/MS in the preceding section serves as 
an introduction to the basic theory and practice of MS His- 
torically, this was the first and most widespread ionization 
procedure used. However, in practice this technique is lim- 
ited to small, stable and volatile sample components and is 
unsuited to macromolecules such as peptides and proteins. 
We will now review other ionization methods some of which 
allow informative MS analysis of larger mass molecules. A 
summary of the principal ionization methods used in MS of 
biomolecules is provided in Table 4.2. 

Chemical ionization (CI) involves ionization of sample 
in the presence of a large excess of a reagent gas such 
as ammonia or methane. A beam of electrons similar to 
that used in EI is passed through this gas leading to the 
formation of radicals such as CH} and NH}. Unlike EI, 
which generates a chemically unstable radical, CI generates 
a stable ion from each sample component due to proton 
transfer from the reagent gas to the molecule (Figure 4.6). 


Liquid matrix (e.g. 
glycerol) 


Deposition on matrix 
(e.g. nitrocellulose) 


Sample mixed with solid 
matrix (e.g. sinapinic 
acid) which absorbs 
strongly at Imax of 


Liquid matrix (aqueous 
or aqueous/organic) 


Determination of structure of small 
molecules by interpreting 
fragmentation pattern obtained. 
Unsuitable for proteins 

Accurate mass of intact M + H. 
Unsuitable for proteins 


Accurate mass of intact M + H or M — H 
ions. Suitable for peptides or small 
proteins 


Gives mass of large, intact samples such 
as proteins (<20 kDa). Uses TOF 
analyzer 

Gives mass of large, intact 
macromolecular ions. Uses TOF 
analyzer samples such as proteins 
(<20 kDa). Uses TOF analyzer 


Gives mass of intact macromolecular 
ions. Uses Formation of 
multiply-charged ions facilitates very 
accurate mass determination. Usually 
uses quadrupole analyzer 

High flow-rates facilitate interfacing with 
HPLC. Suitable for studies with 
proteins 


This ion is called a guasimolecular ion and it is one AMU 
greater than the mass of the molecule itself (often referred 
toas M + H). The stability of this ion is advantageous as the 
mass of the species generated can be accurately determined. 
It is disadvantageous, however, in that no fragment ions are 


Methane; 
CH4 ——> CH4**, CH3+*, CH2** 


CH4*° + CHy ——> CH;* + CH3- 
M + CH;t ——> MH* + CH3 


Ammonia; 
NH3 ——> NH;3**, NH2 +°, NH*° 


NH3** + NH; ——> NH4* + NH. 
M + NHy+ ——> MH*t + NH; 


Figure 4.6. Chemical ionization of particle of mass, M, with 
methane and ammonia. 
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Figure 4.7. Continuous Flow Fast Atom Bombardment (FAB) Ionization. Sample, held in a liquid matrix, is bombarded with a beam of 
atoms such as Xenon. Ions are formed in a dense gas which may be accelerated towards the analyzer and detector. In continuous flow 
(dynamic) FAB, sample may flow continuously in a capillary passing through the probe as shown. 


formed in the process and, therefore, little structural infor- 
mation is generated from this ionization method. In prac- 
tice, both EI and CI spectra can be generated from a given 
sample generating complementary information on chemical 
structure and mass, respectively. As with EI, it is still nec- 
essary to vaporize the sample before ionization. This means 
that CI shares some of the limitations of EI, being directly 
useful mainly in MS analysis of small, stable and volatile 
molecules and unsuitable for biological macromolecules. A 
partial exception to this is atmospheric pressure chemical 
ionization (APCI) which is a specific form of CI described 
briefly at the end of this section. CI is also an important facet 
of many of the ionization procedures which allow MS anal- 
ysis of large molecules such as proteins (e.g. the formation 
of quasimolecular ions). 

These procedures differ from each other mainly in their 
ionization mode. The experimental problem is to generate in- 
tact, singly- or multiply-charged ions from macromolecules 
such as proteins in the gas phase. Most ionization modes 
used with macromolecules depend on either bombardment 
of the sample by a beam of energetic particles or else on 
evaporation of sample into the gas phase. A common fea- 
ture of these methods is the use of some form of matrix in 
which sample is bombarded or from which it may be evap- 
orated. The matrix, which is usually liquid but is a solid in 
some ionization modes, serves to isolate sample components 
and prevent their aggregation. It provides a reservoir from 
which ions may be gently recruited into the vapour phase 
without resulting in extensive fragmentation. The matrix 
may be regarded as a sort of ‘platform’ which is removed 
after volatilization leaving single sample ions in the gas 
phase. 

Fast Atom Bombardment (FAB) ionization uses a high- 
energy (8-30 keV) beam of neutral atoms such as argon or 
xenon, accelerated to high velocities to cause both volatiliza- 
tion and ionization of the sample (Figure 4.7). The material 
to be analysed, dissolved in a liquid matrix such as glyc- 
erol, is placed on the surface of a probe and, as a result of 


bombardment by the beam of atoms, a dense gas forms im- 
mediately above the sample. Ionization of sample molecules 
in this gas is thought to be due to the generation of an ex- 
tremely transient increase in temperature called a high tem- 
perature spike which is of sufficient duration to generate ions 
but too short to cause heat-induced bond breakage. The gas 
contains a mixture of anions, cations and neutral molecules 
(which can be subsequently ionized by CI interactions with 
charged ions in the gas, see below). Alternatively, ions may 
be preformed in the matrix and desorbed by the gas bom- 
bardment. By varying the voltage, anions or cations can be 
accelerated towards the analyzer. 

If sample were merely deposited on the probe as a dry 
residue, there would be a rapid decrease in yield of sample 
ions. However, introducing the sample in a liquid matrix 
allows continuous replenishment of sample ions from the 
matrix surface. Moreover, in the gas phase the matrix be- 
haves in a similar manner to the reagent gas in CI shown 
in Figure 4.6, generating quasimolecular ions (either pro- 
tonated (M + H) cations or deprotonated (M — H) anions) 
from uncharged molecules of the sample in the gas phase. 
In FAB, the matrix itself generates characteristic ions which 
are called matrix cluster ions which are useful in calibration 
of spectra. 

A limitation of FAB arises from the fact that there seems 
to be a significant difference in ionization efficiency between 
molecules that exist on the surface of the liquid matrix and 
those which are buried inside it (i.e. between sample compo- 
nents of differing hydrophobicities and surface activities). 
Molecules of the former type are much more readily ionized 
than those of the latter. This is called selective desorption or 
suppression effect and it complicates analysis of FAB/MS 
spectra since the precise composition of the sample may 
not be known prior to the analysis. Suppression effects are 
also a problem with other ionization methods and can be 
avoided by separating the sample before MS by some other 
high-resolution technique such as HPLC (Chapter 2; Sec- 
tion 4.3.1) or electrophoresis (Chapter 5; Section 4.3.3). 
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Figure 4.8. Plasma Desorption Ionization (PDI). 7°?Cf breaks down to form two particles which travel in opposite directions at the same 
velocity. One of these activates a time counter (at to) while the other, causes vapourization and ionization of the sample. Smaller mass ions 
travel down the drift tube of the TOF analyzer more quickly than larger ones. When ions arrive at detector, time of flight (t) is calculated. 


Continuous analysis of samples eluting from such separa- 
tions, which is called continuous flow FAB or dynamic FAB, 
ensures that no component will be overlooked due to its 
different ionization efficiency (Figure 4.7). 

Plasma desorption ionization (PDI) is similar to FAB ex- 
cept that it involves bombardment of sample by a beam 
of plasma (i.e. atomic nuclei stripped of their electrons 
(Figure 4.8). This is achieved with the radioactive isotope 
Californium, 7°*Cf, which breaks down to emit a stream 
of emission nuclei. Typical products, which are simultane- 
ously emitted in opposite directions at identical velocities, 
are ®Ba* and !8Tct. Sample is coated onto a suitable sub- 
strate such as aluminized mylar in a matrix composed of 
either nitrocellulose or glutathione. It is then placed in front 
of the source and exposed to the plasma beam which imparts 
sufficient energy to each component of the sample to cause 
ionization and to desorb these ions into the gas phase for 
analysis. 

PDI is used with a time-of-flight (TOF) analyzer (see fol- 
lowing section) which measures the time required for each 
ion to travel to the end of a tube called the drift tube where 
they are detected. A feature of the plasma source is that, 
simultaneous with emission of the plasma product which 
causes sample ionization, a second product is emitted in the 
opposite direction with the same velocity. This is used to be- 
gin a time counter which allows accurate estimation of zero 


time and, thus, accurate determination of the time-of-flight. 
Since ions generated in PDI have the same momentum, ions 
of smaller mass travel down the drift tube more quickly than 
those of larger mass (i.e. they have greater velocity). From 
the time measured for each ion, it is therefore possible accu- 
rately to measure ion mass. Like FAB this ionization tech- 
nique is regarded as quite gentle and rarely results in frag- 
mentation of sample molecules. Because of development of 
other ionization methods, PDI is now mainly of historical 
interest and has been largely superseded by matrix-assisted 
laser desorption ionization (MALDI). 

MALDI depends on bombardment of sample (held 
cocrystallized in a suitable matrix) with pulses of high- 
energy laser radiation in the UV part of the electromagnetic 
spectrum (Figure 4.9). The nature of the matrix is critical to 
this ionization method since it absorbs much of the energy 
from the laser, avoiding significant breakdown of sample 
structure. The matrix (e.g. 2,5-dihydroxybenzoic acid) is a 
solid present in large excess and is composed of molecules 
which absorb the radiation energy at the wavelength of the 
coherent laser light, unlike the sample components. At ul- 
traviolet wavelengths, conjugated structures are especially 
useful as matrix components. When the sample is exposed to 
the high-energy radiation of the laser, matrix and intact sam- 
ple molecules are vapourized in a jet. They are said to ‘sput- 
ter’ into the gas phase where sample molecules are ionized 
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Figure 4.9. Matrix-Assisted Laser Desorption Ionization (MALDI). A pulse of laser light (UV) is directed at the sample which is cocrys- 
tallized with a suitable matrix component. This matrix component absorbs most of the energy of the laser. Both matrix and sample are 


vapourized by the laser. In the gas phase, proton exchange occurs between matrix and sample components producing M + H ions. 


by the excited matrix molecules, presumably by a process 
of proton transfer similar to CI. A molecular ion (M + H) is 
generated by this transfer. Although singly-charged ions pre- 
dominate in MALDI spectra, ions which are multi-charged 
are also formed. Because of the fact that laser radiation is 
used discontinuously as a pulse, this ionization mode lends 
itself to use of TOF analyzers. It is possible therefore to 
use this technique to analyse large macromolecules such 
as proteins, glycoproteins, oligonucleotides and oligosac- 
charides. The technique is especially versatile allowing MS 
analysis without significant sample fragmentation. A disad- 
vantage is the difficulty of observing low molecular mass 
ions (e.g. <1000 Da) due to the presence of ions due to the 


matrix. This can be avoided by varying the choice of matrix 
material. 

The preceding ionization modes all depend ultimately on 
bombardment of sample for the generation of ions. Elec- 
trospray ionization (ESI) is an evaporation-based ionization 
mode. It involves the generation of a fine spray of charged 
liquid droplets at atmospheric pressure from which the ma- 
trix is later evaporated (Figure 4.10). Because ionization 
takes place at atmospheric pressure rather than in a vacuum, 
this ionization mode is also sometimes called atmospheric 
pressure ionization (API). Sample is again introduced in a 
liquid matrix which is either aqueous or aqueous/organic. 
Typically, sample is sprayed from a very fine metal syringe 


N, 
° 
© 
4kV Se rod To analyzer 
Sample in O og. 2 and detector 
—> @ @ 22 0, 83 — 


Sample syringe 


Spray formation 


Matrix evaporation 


Figure 4.10. Electrospray ionization (ESI). Sample is converted to a fine spray of droplets. Matrix is evaporated from this, leading to 
Coulombic explosion and the liberation of ions which then are directed to the analyzer and detector. In ion spray ionization (ISI), the sample 


is surrounded by a sheath of gas such as Nọ which imparts greater energy and hence facilitates faster input of sample to the system. 
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Figure 4.11. Generation of ions during ESI. Charged droplets are 
generated in a fine spray. When exposed to warm, drying N2 gas, 
they undergo evaporation of matrix. Charge density increases as 
a result of decreasing droplet size, leading to an unstable droplet 
which explodes in a Coulombic explosion. Thus ions are liberated 
into the gas phase where they are accessible to MS analysis. 


maintained at 4 kV. Dry gas, heat or a mixture of each is 
applied to the droplets which, because of the powerful elec- 
tric field, are now highly-charged. This causes the solvent 
to evaporate, in turn, leading to a decrease in droplet size 
and a large increase in charge density on the droplet sur- 
face (Figure 4.11). This leads to increasing instability in 
the droplet resulting in a Coulombic explosion in which the 
droplet generates a number of smaller droplets. Still smaller 
droplets may be formed from these by an identical process 
until, ultimately, sufficiently small droplets are formed in 
which the field due to excess charge is large enough to ex- 
pel ions. These ions enter the gas phase where they may 
be directed into a mass analyzer. This ionization method 
requires relatively small inputs of thermal energy and re- 
sults in gentle ionization of sample components with little 
attendant fragmentation. Accordingly, it is especially use- 
ful in the ionization of intact large macromolecules such as 
proteins. 

As well as generating singly-charged ions, a particular 
feature of ESI is the generation of multiply-charged ions. 
On average, a charge is added for every kDa of mass with 
protein samples. This means that, since detectors with rela- 
tively limited mass ranges such as quadrupole detectors (see 
Section 4.1.4) measure m/z ratio and not just mass alone, it 
is possible to determine the mass of molecules outside the 
normal mass range of the analyzer. For example a molecule 
of 20 kDa possessing 20 charges has a m/z ratio equal to that 
of a singly-charged molecule of mass 1 kDa. Therefore, ESI 
brings large macromolecules into the normal mass range 
of MS analyzers. Moreover, the distribution of multiply- 
charged peaks in the MS spectrum forms a characteristic 
‘fingerprint’ for each ion and allows determination of an av- 
erage relative molar mass. This is a more accurate estimate 
of mass than single m/z measurements of singly-charged 
ions. 
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Figure 4.12. Atmospheric Pressure Chemical Ionization. Sample 
is vapourized by heat and nebulizing gas. The vapour contains 
both sample and solvent. Solvent is ionized by a corona discharge 
from the nearby discharge electrode. Sample molecules are in turn 
ionized by the excited solvent molecules in a CI process. The vapour 
is sampled and subsequently analysed by MS. 


A version of ESI in which a flow of nebulizing gas (usually 
Nitrogen) surrounds the spraying needle within a circular 
sheath is called ion spray ionization (ISI). This allows the 
input of greater energy into droplet formation facilitating the 
use of higher sample flow-rates (up to 1 ml/min compared to 
10 ul/min for conventional ESI) and is capable of tolerating 
higher water levels in solvent. 

In discussing CI above, it was pointed out that this ion- 
ization mode was not really very suitable for direct analysis 
of macromolecules since it is difficult to vapourize these 
without at the same time destroying their structure. Atmo- 
spheric pressure chemical ionization (APCI) is a process 
in which sample is nebulized at atmospheric pressure by a 
combination of a stream of gas and direct heating (Figure 
4.12). This produces a vapour which may be then ionized 
by acorona discharge from an electrode placed immediately 
after the nebulizing inlet. Ions are formed by CI and are then 
accelerated and analysed in the usual way. Because this ion- 
ization mode can accommodate relatively large flow rates, it 
is conveniently interfaced with HPLC to allow MS analysis 
of peaks which have been chromatographically separated 
(Section 4.3). However, because heat is used to volatilize 
sample, this ionization mode is not appropriate for studies 
of proteins and peptides. 


4.1.4 Equipment Used in MS Analysis 


The formation of gas phase ions is a prerequisite for MS ex- 
periments. These ions are formed in the source. In EI/MS, 
this is an evacuated chamber into which sample may be in- 
troduced as a vapour (Figure 4.13). A thin piece of metal 
such as rhenium or tungsten which is called a filament is 
heated to 2000 K. At high temperature, metals can lose elec- 
trons by a process of diffusion. This happens more quickly if 
an electric potential is also applied to the heated filament. A 
potential of 70 V generates a beam of electrons with energy 
of 70 eV. These 70 eV electrons stream across the chamber 
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Figure 4.13. EI/MS source. Sample (shown as circles) is intro- 
duced as vapour into evacuated chamber. Electrons diffuse from 
the heated filament at energy of 70 eV. The electron beam ionizes 
sample molecules into anions or cations which are accelerated by 
an electric potential towards the analyzer. 


and interact with the diffusing sample molecules (which are 
neutral) and converts them either into anions or cations as 
described in Section 4.1.2. In order to increase the ease with 
which electrons diffuse from tungsten filaments, they are 
often coated with thorium oxide. A similar source is used 
for CI/MS but with the proviso that it requires to be more 
gas-tight than that used for EI/MS, to accommodate the high 
gas pressure (1 torr) required. 

In bombardment ionization methods, a beam of high- 
energy atoms or ions needs to be generated. In FAB, a 
fast atom gun is used to generate a beam of high-energy 
(3-10 keV) argon, Krypton or xenon atoms while a higher- 
energy beam of caesium ions (20-30 keV) may instead be 
generated by a caesium gun. For MALDI a laser source 
generates a beam of UV laser light. 

The analyzer has the function of separating ions based 
on their different m/z ratios. Before looking at the main 
types of analyzers available for MS of macromolecules, it 
is important to define resolution, R, in the context of MS 
This is the ability of a mass spectrometer to distinguish 
between ions of different m/z ratios. It may be calculated 
from Equation (4.2); 


R= — (4.2) 


where M is the mass of an MS peak of interest and AM is 
it’s width at 50% of it’s height. A system with a resolution 
of 500 can distinguish between ions of m/z of 500 and 501, 
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Figure 4.14. Magnetic focusing of ions in MS A detector slit is 
placed at a distance, R, from the centre of trajectory arc. By varying 
field strength or accelerating voltage, the beam may be focused 
through a number of possible trajectories (1 to 3). Similarly, beams 
of different ions may be detected in this way. 


while one with a resolution of 4000 can distinguish between 
this mass and 4001. 

When an ion, accelerated with a voltage V, enters a con- 
stant magnetic field of strength B, it will tend to follow a 
circular path of radius R according to Equation (4.3); 


m — B-R? 


Z 2V 


(4.3) 


If a detector slit is placed at a constant distance, R, from 
the centre of the trajectory arc, it is possible to focus ions 
on this slit by varying either the field strength B or the ac- 
celerating voltage V (Figure 4.14). The maximum value for 
B determines the upper limit of m/z attainable at each value 
of V. This technique, magnetic focusing, is inadequate for 
high-resolution work since ions of a single m/z ratio enter- 
ing the magnetic field or sector will often possess a wide 
range of velocities and therefore have a range of momentum 
values. They would therefore describe a range of possible 
arcs as shown in Figure 4.14. However, if the magnetic sec- 
tor is preceded by an electrostatic analyzer to ensure that 
all ions entering the magnetic field have the same velocity, 
high-resolution of m/z is possible. This type of analyzer is 
called a double-focusing magnetic sector analyzer and is 
shown schematically in Figure 4.15. The upper limit of m/z 
attainable with such an instrument is approx. 10000. 

An alternative analyzer suitable for work with lower m/z 
values (up to 4000) is the guadropole mass analyzer (Figure 
4.16). In this analyzer, two pairs of parallel rods are ar- 
ranged on either side of the beam of ions. Opposite rods are 
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Figure 4.15. Double-focusing magnetic sector analyzer. Ions are deflected in the electrostatic sector in a manner proportional to their energy 


and independent of their mass. Thus a beam of ions of identical energy is selected. Deflection in the magnetic sector depends on m/z ratio. 


connected electrically while adjacent rods are connected by 
a voltage which has radiofrequency (RF) and direct cur- 
rent (DC) components. By varying the radiofrequency, it 
is possible to select for single m/z values, since all other 
ions collide with the rods and are lost. Because only a small 
number of m/z ratios may traverse a path along the axis 
of the rods, this analyzer is also called a quadropole mass 
filter. Particular advantages of these analyzers are the fact 
that they are lightweight, cheap, can be rapidly switched 
between positive and negative ion modes and, since they 
can accommodate higher gas phase pressures than sector 
analyzers, they are especially suitable for interfacing with 
GC and HPLC systems. 

Time of flight (TOF) analyzers (Figure 4.17) can measure 
m/z values up to 200-500 kDa. In this system, ions are ac- 
celerated down a long tube called the drift tube which is 
free of magnetic field. The time required for each ion to 
arrive at the detector (i.e. the time of flight) is measured and, 
from this, the m/z is calculated. These detectors lend them- 
selves especially to discontinuous ionization methods such 
as MALDI and PDI (see previous section). 
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Figure 4.16. Quadropole mass analyzer. Varying the radiofre- 
quency (RF) component of voltage allows ‘tuning’ for specific m/z 
ratios. Ions with either larger or smaller m/z collide with rods and 
are lost. 


Ion cyclotron resonance Fourier transform MS (FTMS) 
analyzers are in some respects similar to quadropole detec- 
tors. Ions are induced to move in circular (hence cyclotron) 
orbits within an electrostatic field held within the solenoid 
of a powerful magnet which generates a field of strength, B. 
The angular frequency of motion, w, of a particular ion is 
given by Equation (4.4); 


z: B 
w = — 
m 


(4.4) 


The ions are exposed to an alternating electric field and, 
when the frequency of this field is the same as œw, the ions are 
steadily accelerated to larger and larger radii and, simulta- 
neously, move into phase with each other. Since w depends 
on m/z, this ratio can be determined in the analyzer for indi- 
vidual ions to a very high degree of accuracy. The resolution 
attainable with this analyzer is > 1 000 000. What makes this 
type of detector unusual is that ions are not directly detected 
by impacting on the detector but rather by their effect as they 
pass near plates within the detector which induces a current 
in the plates consisting of sine waves called a free induc- 
tion decay (FID). As will also be described later (Sections 
6.2.1), a continuous wave function like the FID can be con- 
verted by a mathematical tool called the Fourier transform 
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Figure 4.17. Time of flight analyzer. Ions are accelerated down a 
long tube (drawing not to scale) and the time required for arrival at 
detector is measured. The m/z ratio may be calculated from this. 
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Figure 4.18. Ion Trap Analyzer. Ions are either created within or 
rapidly injected into the trap. The electrostatic field forces them into 
specific orbits, depending on the m/z ratio. By varying this electro- 
static field, ions may be selectively ejected in order of increasing 
m/z from the trap and detected. 


(Appendix 2) into a discontinuous function, in this case a 
mass spectrum. 

Ton trap analyzers (Figure 4.18) are part of a family of an- 
alyzers called quadropole ion storage systems. This device 
consists of three electrodes, two of which are the end cap 
electrodes while the third is the ring electrode surrounding 
the central part of the analyzer. Ions pass into this analyzer 
and are held in constant orbital motion within it by the elec- 
trostatic field created by the electrodes. By variation of the 
precise field conditions, ions are selectively ejected in order 
of increasing m/z ratio and detected. 


4.2 MASS SPECTROMETRY OF 
PROTEINS/PEPTIDES 


4.2.1 Sample Preparation 


MS has become especially popular for the structural inves- 
tigation both of mixtures of proteins/peptides and of highly- 
purified proteins/peptides. Because of the extremely high 
mass accuracy of these techniques (+0.01%), it is possi- 
ble to obtain detailed structural information on aspects of 
protein structure such as amino acid sequence, chemical 
and post-translational modification (e.g. N-terminal modi- 
fication, glycosylation), disulfide bridge formation and to 
compare closely-related proteins (e.g. mutants/wild-type) 
by peptide mapping. It is often necessary in such stud- 
ies to pretreat samples with specific chemical agents be- 
fore carrying out MS analysis. Normal precautions such 
as prior cleavage, carboxymethylation or reduction of the 
sample should be carefully included in the sample prepa- 
ration before MS analysis but care should be taken that 
reagents used in such treatments do not interfere in the 
analysis. Interfacing of MS with other high-resolution tech- 
niques (Section 4.3) greatly extends its analytical poten- 
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tial and introduction of sample to MS by HPLC or elec- 
trophoresis may often facilitate removal of contaminants 
such as buffer components or chemical agents used in pre- 
treating sample. In other situations, it may be desired to 
introduce the sample directly to MS in a one-dimensional 
analysis. 

Before analysing proteins or peptides by MS, it is nec- 
essary to remove salts present in the sample to avoid the 
formation of adduct ions with alkali metals. These would 
reduce the abundance of protonated quasimolecular ions. 
This is easily achieved either by dialysis (Section 7.6.1), 
by gel filtration (Section 2.4.2) or by reversed phase HPLC 
of the sample (Section 2.4.3). Early studies of peptides us- 
ing EI/MS and CI/MS involved rendering sample molecules 
volatile by their extensive derivatization by acylation or es- 
terification. However, using matrix-based ionization modes, 
this is no longer necessary. For ionization modes using liquid 
matrices, it is simply required to mix the protein (dissolved 
in water) with a solution of the matrix. For modes using 
solid matrices (e.g. MALDD), the mixture of matrix/sample 
is allowed to dry on the probe before acquiring the 
spectrum. 


4.2.2 MS Modes Used in the Study of 
Proteins/Peptides 


Information on intact protein molecular mass is readily 
achieved by FAB, PDI, MALDI, ESI and so on. In general, 
use of a TOF analyzer facilitates determination of very large 
masses. Fragmentation information may be obtained either 
by CID or by pretreating sample proteins chemically or en- 
zymatically, for example by digesting with trypsin followed 
by peptide mass estimation. If the sequence of the protein is 
known, it may be possible to assign masses to each peptide 
generated in the digest. A further level of resolution is added 
to MS analysis by interfacing with other high-resolution 
techniques such as HPLC and electrophoresis. 


4.2.3 Fragmentation of Proteins/Peptides 
in MS Systems 


Proteins and peptides are oligomers of repeating 
(-NH-CHR-CO-) units, differing only in the nature of the 
side-chain, R. The amino acid residues are held together by 
peptide bonds, (-NH-CO-), which have lower bond ener- 
gies than standard (-N—C-—) bonds. When proteins or pep- 
tides fragment (e.g. due to heat or acid hydrolysis), peptide 
bonds are commonly broken releasing mostly intact amino 
acid residues. In MS experiments (e.g. FAB/MS), however, 
there may also be fragmentation at other locations in ad- 
dition to peptide bonds resulting in a complex pattern of 
ions. These are related to each other by incremental m/z dif- 
ferences because only 20 R-groups of known structure are 
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Figure 4.19. Fragmentation of protonated peptide in FAB/MS (i) Positive ions may be generated at the C- or N-terminus of peptide bonds. 
(i) Main-chain fragmentation. (iii) Side-chain fragmentation. Note that dn and wn -type ions facilitate distinction of isomers such as Leu 


and Ile. 


commonly found in proteins and because breakage points 
have fixed structural locations relative to each other. It is 
therefore possible to interpret complex MS spectra to learn 
about the structure and especially amino acid sequence of 
peptides and proteins (Section 4.4.6). A nomenclature pro- 
posed by Roepstorff is widely used to describe polypeptide 
fragmentation in MS (Figure 4.19). 

Ions generated from a polypeptide during MS may retain a 
positive or negative charge either on their C- or N-terminus. 
A horizontal line pointing towards the C- or N-terminus at 
the breakage point is used to denote which fragment car- 
ries the charge in that particular ion (Figure 4.19). Two 
series of possible fragments are denoted by letters with a 
subscript number, n, to denote the sequence position of the 
residue. For the N-terminal ions these are a,, ba, Cn and d, 


(numbered from the N-terminus) while the C-terminal ions 
are designated Vn, Wn, Xn, Yn and zn (numbered from the 
C-terminus). The ion represented by a, represents the first 
residue in the sequence (minus the CO group) while az rep- 
resents the first two residues (minus the CO group), and so 
on. Identical main-chain breakage points are denoted by the 
pairs a/x, b/y and c/z (each member of a pair referring to 
a positive charge retained either on the N- or C-terminus, 
respectively). Breakage points denoted by d,,, v, and w, are 
due to side-chain fragmentations. These are used to distin- 
guish residue pairs such as Leu/Ile and Gln/Lys which have 
identical masses (i.e. they are isobaric). 

An alternative fragmentation method is offered by in- 
teracting samples with electrons in electron capture dis- 
sociation (ECD) and electron transfer dissociation (ETD). 
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Figure 4.20. Collision-Induced Dissociation (CID) in ESI. Three quadropole analyzers (Section 4.1.4) are used to select for ions with 
different m/z ratios. Q1 selects intact ions of specific m/z. A collision gas (argon) is introduced into the second analyzer (Q2) called the 
collision cell. Collision with this gas causes fragmentation of the ion into smaller ions. These are analysed in the third quadropole analyzer, 


Q3. Other formats for this kind of experiment are also possible. 


Both methods are based on reaction of electrons with pep- 
tide (or intact protein) cations in the magnetic field of an 
FTMS instrument (Section 4.1.4). In ECD, the protonated 
sample anion interacts directly with a low-energy electron 
while in ETD radical anions are the source of electrons. 
Peptide/protein cations with an additional electron are gen- 
erated which undergo rearrangement resulting in backbone 
fragmentation. Importantly, there is no fragmentation of 
bonds in side-chains. Thus, unlike collision-based frag- 
mentation, ECD/ETD are relatively insensitive to peptide 
length (collision-based fragmentation is poor with either 
large or small tryptic peptides) and to post-translational 
modifications (these are often labile in collision-based 
dissociations). 


4.3 INTERFACING MS WITH 
OTHER METHODS 


Interfacing MS with other high-resolution methodologies 
greatly extends the range of analytical problems which may 
be addressed in biochemistry. This topic is further explored 
later in this volume in the context of proteomics (Chapter 
10). Interfacing may be with MS alone (e.g. FAB or ESI/MS) 
or with tandem MS. Major technical challenges encountered 
in developing such methodologies include matching ana- 
lytical system flow-rates with those of MS and achieving 
ionization of sample presented in a continuous flow to the 
MS system. In the following sections, we will look at some 
two-dimensional analysis systems possible with MS 


4.3.1 MS/MS 


Fragmentation can be achieved by a process called collision- 
induced dissociation (CID). This involves introduction of a 
single-mass ion of interest from the analyzer into a second 


analyzer called the collision cell where it may collide with 
an inert gas such as argon (Figure 4.20). The ion fragments 
into a characteristic set of daughter ions which can be anal- 
ysed by MS in a third analyzer. Since this experiment in- 
volves the successive use of three MS analyzers, one to 
select the parent ion, one to fragment it and one to analyse 
the daughter ions, it is called tandem MS (other terms for 
this technique include MS/MS, MS? and four-sector MS). It 
allows selective fragmentation of single ions from the first 
analyzer. 


4.3.2 LC/MS 


We have seen that MS analysis of components of com- 
plex samples may be complicated by selective desorption 
of sample components at rates dependent, for example, on 
their surface activity (Section 4.1.3). By first fractionating 
samples by HPLC (Chapter 2) and following this with MS 
analysis, we can combine high-resolution chromatographic 
separation with high-resolution analysis (i.e. mass deter- 
mination). This is called HPLC/MS or LC/MS. In such a 
system, MS can be regarded as essentially a detector for the 
LC system, but one which gives useful structural informa- 
tion about each component of the sample. An outline of the 
experimental arrangement used in LC/MS is shown in Fig- 
ure 4.21. Although a variety of chromatography modes are 
possible for LC, in practice reversed phase chromatography 


Figure 4.21. Schematic of LC/MS system. The role of the interface 
is to transfer sample components effectively and quantitatively to 
the MS, prepare sample components for ionization, cope with LC 
solvent and bridge pressure differences between LC and MS. 
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Figure 4.22. The ionspray interface. Sample flows through HPLC and is converted into a fine mist of electrically-charged droplets by 
electrospray (Section 4.1.3). These droplets are rapidly dried by a flow of nitrogen which removes solvent (shown as red ovals). 


is used most often for protein samples. It is feasible to carry 
out LC/MS analysis off-line (i.e. by first collecting peaks 
from LC and evaporating solvent followed by MS analysis), 
but on-line analysis is preferable in most applications. A 
feature of all LC/MS experiments is the need for an inter- 
face between the two systems. A wide variety of interface 
designs are possible but not all of these are suitable for sam- 
ples such as biopolymers. The ionspray interface (Figure 
4.22) is particularly useful for protein and peptide samples 
because it facilitates the high flow-rates of LC systems as 
mentioned earlier (Section 4.1.3). 

Flow-rates of most LC systems (e.g. 1-2 ml/min) are gen- 
erally far higher than those of MS If the two systems were 
simply connected together then the vacuum component of 
the MS could not be maintained due to limitations in the 
pumping capacity of the system. There are a number of 
ways in which this problem can be overcome. One possibil- 
ity is enlargement of MS pumping capacity. Alternatively, 
solvent (usually volatile organic solvents) could be removed 
prior to introduction of sample components into the high- 
vacuum region of the MS by an analyte-enrichment inter- 
face. This preferentially removes solvent rather than sample 
components leading to enrichment for the latter. Yet another 
approach is postcolumn splitting and diversion of solvent 
flow (Figure 4.23). In addition, the volume of the column 
may be reduced by the reduction of column diameter, that 
is the use of columns with small internal diameters (e.g. 
1-2 mm) or capillary columns (internal diameters; approx. 
50-320 um). These possibilities are not exclusive of each 
other and several of them are found in LC/MS formats used 
for the study of proteins. 

The most common ionization modes used in LC/MS are 
ESI and FAB although it has recently become feasible con- 
tinuously to introduce sample from LC either as an aerosol 
or mixed with a liquid matrix to MALDI. 


4.3.3 GC/MS 


GC analysis (Section 2.1.4) is especially useful in study 
of low molecular weight, volatile samples and has no di- 
rect applications to the study of proteins or other biomacro- 
molecules. However, the technique facilitates very high res- 
olution separation of molecules such as drug metabolites, 
toxins and steroid hormones which are of biochemical in- 
terest. Just as with LC/MS, the analytical potential of GC can 
be greatly increased by interfacing it with MS in GC/MS. 
Mass estimates allow identification of the most likely chem- 
ical structure for each sample component and the identifica- 
tion, for example, of metabolites of a parent compound. The 
problem which we have seen presented by LC flow-rates is 
potentially even greater in the case of GC where flow-rates 
can exceed 20 ml/min. This is mainly overcome by the use 
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Figure 4.23. LC/MS system suitable for analysis of proteins by 
electrospray MS. Experimental setup featuring splitting of solvent 
flow and use of small-diameter HPLC. Analytical HPLC of sample 
is facilitated by splitter 1 while splitter 2 allows collection of sample 
components. 
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Figure 4.24. Analyte-enrichment interface for GC/MS Sample en- 
ters interface from GC. Sample components diffuse more efficiently 
across a dimethylsilicone membrane than the carrier gas. 10—30-fold 
enrichment of sample is achieved with this interface. 


of GC capillary columns which have low flow-rates (e.g. 
1 ml/min). An alternative approach is use of highly-efficient 
analyte-enrichment interfaces of the type described above. 
These depend on the fact that, in GC, a carrier-gas is used 
instead of a liquid solvent. This is usually lighter than the 
sample components and may have different diffusion prop- 
erties. An example of such an interface is shown in Figure 
4.24. When capillary columns are used, flow is usually di- 
rectly into the MS instrument without the use of splitters or 
enrichment interfaces. 


4.3.4 Electrophoresis/MS 


High-resolution separation of complex samples may be rou- 
tinely achieved by carrying out electrophoresis under a wide 
range of experimental conditions and in a wide variety of 
formats (Chapter 5). The highest-resolution analytical elec- 
trophoretic systems, capillary electrophoresis (CE; Section 
5.10) and capillary isotachophoresis (CITP; the physical 
basis of isotachophoresis is described in Section 5.3.1) take 
advantage of electrophoretic flow in extremely narrow capil- 
laries under very high voltages (10-60 kV). These capillaries 
efficiently dissipate heat, allowing high-resolution separa- 
tion without deterioration of sample. This can be applied to a 
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Figure 4.25. CE-MS electrospray interface. Flow of sheath liquid 
(approx. 5 ul/min methanol, propanol or acetonitrile) adds to CE 
flow, resulting in increased flow-rate to the electrospray. This system 
functions with aqueous buffers and low inner diameter capillaries. 
The electrospray electrode is stainless steel. 


wide spectrum of bioseparations including enantiomeric res- 
olution of low molecular weight isomers and separation of 
peptides, proteins, DNA, viruses and even whole cells. Since 
they are capillary-based methods with consequent small to- 
tal volume, both CE and CITP use extremely low sample 
loadings and flow-rates compared to other electrophoretic 
and chromatographic methods. 

In interfacing these procedures to MS, we are therefore 
presented with the problem of matching the very low flow- 
rates of CE/CITP (0-100 nl/min) with those of MS but 
without losing the high resolution achieved in CE/CITP by 
peak broadening effects. One approach to this is the addi- 
tion of make-up flow postcapillary that is the addition of 
extra solvent to increase the flow volume to that of the MS 
system. Alternatively, a low dead-volume T-piece may be 
inserted between the capillary and CE/MS interface. A fur- 
ther problem may be presented by the low migration times 
possible in CE systems (e.g. 100-1000 s). This may be too 
short for sufficient scans to allow accurate characterization 
of MS peaks. 

As with LC/MS and GC/MS, interfaces are also used in 
electrophoresis/MS Among the most popular of these is the 
electrospray interface (Figure 4.25). This illustrates the use 
of make-up flow to facilitate low CE flow-rates and also 
facilitates the use of CITP/MS CITP is a form of CE which 
results in concentration of sample components into compact 
bands and which allows somewhat larger loading volumes 
than standard CE. 


4.4 USES OF MASS SPECTROMETRY 
IN BIOCHEMISTRY 


Variations in protein structure are often of great impor- 
tance biochemically. Many techniques used in the study of 
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proteins may not adequately distinguish such variants from 
each other. However, the high accuracy mass determination 
possible by MS analysis makes it feasible to distinguish 
proteins which may differ from each other by only a few 
AMU. Moreover, since mass values are directly related to 
distinct chemical structures, MS analysis makes interpre- 
tations possible which might be problematic using other 
analytical procedures. One of the first demonstrations of the 
usefulness of MS analysis of proteins was the demonstration 
that cDNA-derived sequences of some proteins were incor- 
rect. The following sections describe some further examples 
which illustrate the versatility of MS in structural studies of 
biomacromolecules. 


4.4.1 MS and Microheterogeneity in Proteins 


When expressed in heterologous expression systems (e.g. 
a human gene cloned into and expressed in Escherichia 
coli), many proteins are inappropriately, incompletely or 
otherwise incorrectly processed. This arises from the fact 
that the expression system may lack particular enzymes re- 
sponsible for catalyzing processing or else these enzymes 
may simply not recognize processing sites due to species- 
specific sequence differences. This may happen either within 


Blocking 


N 


i 


Aminopeptidases 


Glycosylation 
Phosphorylation 
Acylation 

Disulphide formation 
Charge isomerisation 


Endopeptidases 


or at the termini of the polypeptide chain (Figure 4.26; 
Table 4.3). 

Eukaryotic proteins are biosynthesized with the initia- 
tion methionine intact which is post-translationally removed 
by enzymatic digestion. Prokaryotic proteins, by contrast, 
are biosynthesized with an initiation N-formyl methion- 
ine. Prokaryotic and eukaryotic systems may lack appro- 
priate enzymes for removal of initiation residues from eu- 
karyotic and prokaryotic proteins, respectively. A range of 
other chemical structures are also possible at the N-terminus 
(Table 4.3), which can result in N-terminal ‘blocking’ 
(i.e. an inability to sequence the protein chemically us- 
ing the Edman degradation). This can be important, for 
example, if N-terminal sequence is required by regula- 
tory authorities as confirmation of protein identity/purity 
during commercial production of recombinant proteins. 
Heterogeneity is also possible at the C-terminus of the 
polypeptide chain. Protein biosynthesis may be prematurely 
terminated in heterologous expression systems which can 
result in a variety of C-terminal sequences, called ‘ragged 
ends’. Moreover, bacterial cells may contain a variety 
of exopeptidases which may cause proteolysis at the N- 
or C-terminus of proteins/peptides during the purification 
procedure. 


Ragged ends 


ig 


Carboxypeptidases 


Figure 4.26. Some sources of covalent modification of polypeptides. Polypeptides may be covalently modified internally or at their termini. 


These modifications result in altered molecular mass detectable by MS. 
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Table 4.3. Some possible sources of heterogeneity within polypeptides 
Modification Possible structures 
N-terminus 
Acylation —NH> 
—NH-CO-CH; 
N-formylation —NH) 
—NH-—CO-H 
N-formy] methionine —NH, 
z (CH2)2— S- CH3 
—NH- CO- CH N 
NH-CO-H 
Pyrollidone carboxylic NH> 
acid (due to cyclisation Nae CH3 
of glutamine) c= o 8 
CH =O 
(CH2) \ : F 
l —CO—-CH-NH 
—CO-CH-NH)2 
Aminopeptidases and R> Ri 
aminoproteinases@ | | 
—CO-CH-NH-CO- CH-NH) 
R2 
—CO-CH-NH3 
Internal: 
Glycosylation’ OH 
OH 
OH 
CH2 
t= OH NACo 
—CO-CH-NH- i 
ÇH 
—CO-CH-NH- 
Phosphorylation” o 
l 
OH o= p =0 
| 
Çh ° 
—CO-—CH=NH= CH) 
—CO-CH-NH- 
Charge isomerisation’ NH> o- 
(e.g. deamidation) `e z0 i =0 
| 
(CH2) (CH2) 
| 
-Cco- CH-NH, —CO-CH-NH; 


(Continued) 
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Table 4.3. (Continued) 


Modification Possible structures 
Disulphide formation® 7) 
NH NH 
CH" CH= SH SH=ÇH=CH 
co co 
A | Le 
A 
NH NH 
CH- CH2- S— S-CH2—CH 
co co 
© Ses 
Oxidation CH; CH; CH; 
re p~ pis 
` po eje 
CH3 CH: CH3 
—CO-CH-NH- —CO-CH-NH- —CO-CH-NH- 
Methionine Sulphoxide Sulphone 
Endopeptidases and Rə Ri 
endoproteinases? 


l l 
—CO—CH- NH- CO- CH- NH- 


| 


is jà 
—CO-CH-NH; COOH-CH-NH- 


C-terminus: 
Carboxypeptidases@ R3 R, 
l 


l 
—NH- CH- CO- NH- CH- COOH 


, 


is T 
—NH-CH-COOH NH2;—CH-COOH 
ragged ends’ R2 Ri 
—NH- CH-CO-NH-CH- COOH 
R2 
—NH- CH- COOH 


“Also possible at threonine as well as serine. Glycosylation is also possible at asparagine and lysinc residues. 
» Also possible at asparagine. Deamidation of asparagine and glutamine results in charge isomer formation as these amides are uncharged. 


CMS of samples treated/untreated with reducing agents would be expected to give distinct patterns depending on number and location of 
disulphides. 


4Exposure to these enzymes may occur artefactually during protein purification. 


Sources of possible heterogeneity within polypeptide 
chains include glycosylation, phosphorylation, acylation, 
disulfide bond formation, charge isomerization and endo- 
protease activity. Some examples of these are described in 
more detail in the following sections. 

Covalent modifications (Figure 4.26; Table 4.3) result in 
mass alterations measurable by MS Comparison of m/z val- 
ues of variants of proteins allows us, therefore, readily to 
identify such modifications. This determination is indepen- 
dent of the chemical basis of heterogeneity and this is a 
major reason for the versatility of MS. 


4.4.2 Confirmation and Analysis of Peptide Synthesis 


Solid-phase peptide synthesis (Figure 4.27) has proven an 
invaluable tool for the generation of model peptides which 
can be used in a wide variety of studies (e.g. synthesis of epi- 
topes for antibody preparation, design of novel antibiotics 
and identification of novel protein ligands in drug screen- 
ing programs). The technique involves use of amino acids 
which are initially chemically protected or blocked at -NH2 
and other functional groups (e.g. e- NH2, -SH) while chemi- 
cally activated at their COOH group. These amino acids will 
readily react with the exposed -NHp group of an amino acid 
immobilized on a solid phase, thus forming a peptide bond. 
De-blocking of the blocked -NH, group allows a second 
cycle of reaction leading to formation of a second peptide 
bond and, in this way, a polypeptide chain of 20-30 residues 
may be synthesized from the C-terminus. When synthesis is 
complete, the peptide is cleaved from the resin (exposing a 
C-terminal -COOH) and functional groups are deblocked. 
This procedure is readily automated and peptides may be 
routinely synthesized by adding commercially-available ac- 
tivated/blocked amino acid derivatives in the desired se- 
quence. 

Common problems which occur in peptide synthesis and 
which lead to heterogeneity in the final peptide product are 
the formation of deletion peptides and incomplete deblock- 
ing. Modifications specific to particular residues such as 
air-oxidation of methionine or trifluoroacetylation of serine 
are also possible. Deletion peptides arise due to incomplete 
peptide bond formation in a synthetic cycle, resulting in 
a proportion of peptide lacking the residue corresponding 
to that cycle. Since deletion peptides will lack a particu- 
lar amino acid, they can be readily and sensitively iden- 
tified by MS analysis. Incomplete deblocking can result 
in functional amino acid side-chains retaining their block- 
ing groups, giving them an inappropriate covalent structure. 
Since the masses of blocking groups are known and vary 
between different amino acid side-chains, MS analysis al- 
lows identification of specific incomplete deblocking steps, 
facilitating their improvement in further rounds of synthesis, 
perhaps by increasing reaction times or by altering reaction 
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conditions. MS analysis of a synthetic peptide is described 
in Example 4.2. 


4.4.3 Peptide Mapping 


Digestion of proteins with specific endoproteinases results 
in a unique population of peptides. Reversed phase HPLC 
using C-18 stationary phase (Section 2.4.3) may be used 
to achieve high-resolution separation of these. This is par- 
ticularly useful for comparing related proteins such as 
haemoglobins or isoenzymes; where there is a degree of 
similarity in primary structure, we expect common peptides. 
However, in order to associate such peptides with specific 
regions of primary structure in the parent protein, we need 
to determine the individual sequence of each peptide (e.g. 
by the Edman degradation). MS provides a complementary 
method for analysis of peptide maps which, unlike HPLC 
analysis per se, gives molecular mass information about 
the peptides. Thus, it may be possible to associate a pep- 
tide with a particular part of the protein primary structure 
simply by mass comparison (Example 4.3) although care 
should be taken in such experiments to take the isobaric 
nature of some residues into consideration (e.g. Leu/Tle; 
Gln/Lys). 

By interfacing HPLC with MS (Section 4.3.1) or by per- 
forming tandem MS, we can achieve a detailed analysis of 
peptide populations which can be used to identify cleavage 
points in the original sample protein. 


4.4.4 Post-Translational Modification Analysis 
of Proteins 


The post-translational modifications summarized in Figure 
4.26 and Table 4.3 are important to protein function be- 
cause they may be implicated in processes such as protein 
activation/inactivation (e.g. protein phosphorylation in sig- 
nal transduction cascades involving G-proteins), recognition 
(e.g. glycosylation of receptors) or proteolysis (e.g. protein 
turnover). Chemical procedures for analysis of individual 
modifications such as glycosylation, acylation or phospho- 
rylation exist. However, these techniques are laborious and 
only give information about a particular type of modifi- 
cation. The observation that all of the post-translational 
modifications of proteins result in altered mass makes MS 
particularly suitable for their analysis. MS analysis of O- 
glyosylation is described in Example 4.4. 


4.4.5 Determination of Protein Disulfide Patterns 


Several methods (including MS; see next section) are 
available for determination of amino acid sequences. How- 
ever, the complete covalent structure of a protein is only 
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F-CH, 
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Figure 4.27. Solid phase peptide synthesis. (a) An amino acid blocked at the N-terminus (with a 9-fluoenylmethoxycarbony]; F-moc group) 
and activated at the C-terminus (via a pentafluoropheny] ester). (b) Solid phase peptide synthesis procedure involving successive cycles of 
deblocking and coupling to extend the polypeptide chain from the C- to the N-terminus. 
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serine-1 by tandem MS. 


Press, Inc. 


Example 4.2 Determination of microheterogeneity in synthetic peptides 


A peptide was synthesized by the Merrifield method (Fmoc NH>-blocking groups) with the following sequence: 


NHo-Ser-Asn-Tyr-Thr-Phe-Ser-Gln-Met-COOH 
(predicted m/z; 977 Da). 


EIS/MS was performed in a triple-quadropole instrument (Figure 4.12) and the result is shown below: 


The M + H ion with a m/z of 977.4 is the major product but a number of minor peaks are also evident in the 
spectrum. Ions with m/z greater than this are attributed, respectively, to oxidation of methionine-8 (+16 AMU), a peptide 
retaining a tert-butyl blocking group (+56 AMU) and the presence of an O-trifluoroacetyl group on a serine or threonine 
(+96 AMU). The species with m/z of 849 is a deletion peptide lacking glutamine-7 (— 128 AMU). More detailed analysis 
of peptides is possible by tandem MS including, for example, unique identification of the O-trifluoroacetylated residue. 
Of the three possible sites of trifluoroacetylation in this example (2 serines, 1 threonine), this residue was identified as 


Taken from Metzger et al., Analytical Biochemistry (1994) 219, 261-277, reproduced with permission from Academic 


MASS SPECTROMETRY 135 


Am=+16 


| 
K 


1000 1100 1200 


known when the number and location of disulfide bridges 
involving cysteine residues have been determined. This is 
especially important for proteins normally secreted from 
their cells of origin (e.g. hormones, immunoglobulins). 
Disulfide bridges are used to stabilize otherwise unstable 
structures and also represent important intermediates in 
the folding pathways of such proteins. In addition, the 
disulfide bridged polypeptide is an oxidized form of the 
protein which is appropriate to the oxidizing nature of 
the extracellular environment. Indeed, in protein engi- 
neering, disulfide bridges are frequently introduced into 
proteins to enhance their stability to heat (e.g. in the design 
of enzymes suitable for enantioselective synthesis). 


The sulphydryl groups of cysteine are among the most 
chemically-reactive side-chains found in proteins. Advan- 
tage may be taken of this to alkylate or oxidize free sul- 
phydryls or to reduce disulfides to sulphydryls (Figure 4.28). 
Chemical treatments of this type are often used (in associa- 
tion with proteolytic digestion) to determine the pattern of 
disulfide bridge formation in proteins by MS (Example 4.5). 

Digestion of protein samples of known sequence with 
agents of high specificity generates a characteristic mixture 
of peptides with easily predicted m/z values on FAB/MS 
The presence of disulfide bridges results in one or more 
unexpectedly large ions (Figure 4.29). Reduction prior to 
digestion leads to the disappearance of these large ions and 


136 PHYSICAL BIOCHEMISTRY 


Example 4.3 FAB/MS analysis of peptide maps 


the mass difference between glutamine and arginine. 


(a) 
a-globin 


T4 


Relative abundance 


1500 


1600 


(b) 


Relative abundance 


A novel haemoglobin mutant was digested with trypsin generating a population of peptides. One of these peptides was 
purified and subjected to FAB/MS (A below). Instead of the expected m/z of 1833.9 found with wild-type a-globin, a 
new peptide with m/z of 1634.8 was detected. This is consistent with a glutamine — arginine mutation at position 54. 
This conclusion was confirmed by digestion with a second endoproteinase, lysylendopeptidase (Lys-C). This enzyme 
does not cleave at arginyl peptide bonds, unlike trypsin. The FAB/MS spectrum for this digest (B below) shows an ion 
with m/z of 1862, an increase of 28 AMU on the value of 1833.9 obtained for the wild type protein. This corresponds to 


(Taken from Structural Analysis of Protein Variants, Yoshido Wada in Protein and Peptide Analysis by Mass 
Spectrometry (J. Chapman ed.), 1996, Human Press, reproduced with permission. 
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the appearance of pairs of smaller ions in the spectrum. In 
this way, disulfides can be identified in proteins and allocated 
to regions of primary structure based on m/z values. The total 
number of free sulphydryl groups and disulfide bonds in the 
protein may be determined by comparison of m/z of alky- 
lated samples of both reduced and nonreduced protein, while 
oxidation with performic acid results in a characteristic in- 
crease in the mass of cysteine (and methionine)-containing 
peptides (Example 4.5). 

Similar experiments to these can be performed using re- 
versed phase C-18 HPLC (Section 2.4.3) but MS has the 
advantage that m/z data are measured in the experiment 
which may be correlated directly with the protein primary 


structure. In HPLC, it is necessary first to collect individual 
peptides before chemically determining their sequences. 


4.4.6 Protein Sequencing by MS 


The main strategies for obtaining protein sequence infor- 
mation in biochemistry are cDNA sequencing and direct 
chemical sequencing by the Edman degradation. The for- 
mer method has the limitation that it tells us nothing about 
post-translational modifications of the protein such as those 
described in Section 4.4.4 while the latter method is only ap- 
plicable to protein samples with unblocked -NH3 groups and 
is generally limited to short sequences (e.g. <40 residues). 
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Example 4.4 Determination of erythropoietin glycosylation pattern by FAB/MS 


Erythropoietin is a 165-residue protein hormone produced in the kidney and liver which plays a role as a physiological 
modulator of erythrocyte production. FAB/MS previously revealed that removal of arginine-166 was essential for activa- 
tion of recombinant erythropoietin. The active hormone is extensively glycosylated and this is the basis of isoforms of the 
protein. When digested with lysylendopeptidase (Lys-C), a glycosylated peptide is released. FAB/MS analysis revealed 
the following mass spectrum: 

The species with m/z 2500.6 corresponds to the following unglycosylated erythropoietin sequence; 


NH)-Glu-Ala-Ile-Ser-Pro-Pro-Asp-Ala-Ala-Ser-Ala-Ala-Pro-Leu-Arg-Thr-Ie-Thr-Ala-Asp-Thr-Phe-Arg-Lys-COOH 


The pattern of glycosylation explaining the other ions may be described diagrammatically as follows: 
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We have already seen that polypeptides fragment into determined by measuring differences in m/z between con- 
sets of related ions during some MS procedures for which secutive peaks in the mass spectrum while the sequence 
accurate m/z values can be recorded (Section 4.2.3). Com- is determined by the order of the peaks. Information on 
parison of such ions allows identification of short sequences post-translational modifications can also be detected as m/z 


of proteins and peptides. Identities of amino acids may be values of modified residues including the mass of modifying 
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(i) ii + I-CH,— COOH —> T oo oe 


(ii) S—s + 2(H5— (CH — OH) —> 5—S—(CH)—OH 
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+ 
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+ CHO— OOH — > so; 


Cysteic acid 


Figure 4.28. Chemical reactions of cysteine. (i) Alkylation with 
iodoacetic acid, (ii) Reduction of disulfide bridge with 2- 
mercaptoethanol, (iii) Oxidation with performic acid (note; this 
treatment also oxidizes methionine to the corresponding sulphone). 


groups (e.g. +80 Da for phosphate attached to a serine or 
threonine; see Example 4.6). 

Due to the complexity of ion patterns generated in MS of 
peptides/proteins, direct de novo sequencing is a very chal- 
lenging undertaking. However, when combined with other 
experimental approaches, MS can be very informative. Usu- 
ally, it is carried out on samples which have been digested 


with a number of specific cleavage agents such as proteases 
before MS analysis. 

Combining some elements of the Edman degradation 
with MALDI/TOF MS is called protein ladder sequenc- 
ing (Figure 4.30). In this method, fragments of the pro- 
tein to be sequenced are generated by partial Edman degra- 
dation which are then subjected to MALDI/MS (Example 
4.6). The fragments may be generated in an automated pro- 
tein sequencer by incomplete reaction with phenylisothio- 
cyanate (PITC) or by the inclusion of a small amount of a 
terminating agent. For example if 5% phenylisocyanate is 
included with PITC, a small amount of phenylcarbamyl- 
peptide is generated in each cycle of the Edman 
degradation which is stable to subsequent processing. These 
fragments may be analysed after the desired number of se- 
quencing cycles in a single MALDI/MS experiment. The 
mass spectrum resembles a ladder with the spaces be- 
tween each ‘rung’ corresponding to the mass of a partic- 
ular residue. It should be noted that this technique cannot 
distinguish between leucine/isoleucine which have identical 
mass (Section 4.2.3) but glutamine and lysine (which also 
have the same mass) may be readily distinguished since 
the e-NH) group of lysine reacts with PITC to give a mass 
larger by 135 Da, while glutamine does not react with this 
reagent. 


permission. 


(a) 


(b) 


Example 4.5 Identification of disulfide bridges by FAB/MS 


Vasopressin is a peptide hormone of the sequence Cys-Tyr-Phe-Gln-Asn-Cys-Pro-Arg-Gly, which contains a single 
disulfide bridge between Cysteines-1 and -6. FAB/MS of A) untreated and B) oxidized (i.e. treated with performic 
acid) vasopressin are shown below. The presence of a cysteic acid at positions 1 and 6 in the sequence results in a 
characteristic m/z increase of 98 AMU. This treatment can be used to estimate the total number of cysteines in a peptide 
or protein sample. Sun & Smith (1988) Analytical Biochemistry 172, 130-138. Academic Press, Inc, reproduced with 


1182 


S— sS SH SH SH SH 
Reduced 


Non-reduced 


| FAB/MS 


Figure 4.29. Identification of intrachain disulfide bridges in pro- 
teins. Based on the known sequence of the protein, three peptides 
are expected from proteolytic cleavage. One of the peptides (3) has 
the predicted m/z, while a single, unexpectedly large ion is also 
obtained. On reduction, this large ion disappears and is replaced 
with two smaller ions of predicted m/z. The precise location of the 
intrachain disulfide between cysteines of peptides 1 and 2 may be 
determined from this experiment. 


Sequences of up to 30 residues may be measured by 
protein ladder sequencing which is similar to the upper 
limit of the Edman degradation. Since this method gen- 
erates structural information (e.g. identification of block- 
ing or modifying groups) in addition to sequence infor- 
mation, it neatly complements cDNA and direct chemical 
sequencing. 


4.4.7 Studies on Enzymes 


Enzymes achieve very high rate enhancements (the ratio of 
the reaction rate in presence of enzyme to that in its ab- 
sence) which enable the very diverse spectrum of chemical 
reactions found in living systems to occur under relatively 
mild chemical conditions of pH, temperature and pressure. 
Enzymes bind substrates to form an enzyme-substrate com- 
plex within which a chemical reaction happens converting 
the substrate to product. A key component of catalysis is 
stabilization of a reaction intermediate called the transition 
state. This is the least stable species in the reaction pathway 
which has a vanishingly short half-life (~107" s). Since 
transition states are generally highly specific to particular 
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reaction types, there is much interest in understanding more 
about them and molecules structurally resembling transition 
states are promising candidates as new drugs. Very high res- 
olution MS methods such as fourier transform ion cyclotron 
resonance MS can detect tiny mass differences in enzymes 
and their complexes giving powerful insights into enzymatic 
catalysis (Example 4.7). 


4.4.8 Analysis of DNA Components 


Because MS is simply concerned with the measurement 
of m/z values, it is a particularly versatile technique for 
the analysis of nonprotein biomolecules. MS has been ap- 
plied to the analysis of carbohydrates, nucleic acids and 
heterogeneous molecules such as proteoglycans. DNA is 
of particular interest in this regard since modest changes 
in the covalent structure of DNA components can result 
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Figure 4.30. Protein ladder sequencing by partial Edman degra- 
dation. The Edman degradation results in step-wise removal of 
residues at the N-terminus. This generates a nested series of frag- 
ments, with consecutive fragments differing from each other by 
a single residue (i.e. 1-5). This series is analysed in a single 
MALDI/MS experiment and the ‘ladder’ of fragments is interpreted 
as an amino acid sequence. 
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Example 4.6 Protein ladder sequencing by MALDI/MS 


A peptide which is a substrate for protein kinase C was subjected to partial Edman degradation as outlined in Figure 
4.30. The MALDI/MS spectra obtained from A) nonphosphorylated and B) phosphorylated peptides are shown 
below. 


(a) 
(a) 


(b) 


1465.9 S 


900 1000 1100 1200 1300 1400 1500 
Mass/charge 


These spectra could be interpreted by measuring incremental mass differences between successive peaks a-f and 
a'—e', respectively (e.g. 1064.3 — 977.2 = 87.1, the mass of a Ser residue). The following sequence was read out for 
the nonphosphorylated peptide; Ser-Ser-Ala-Arg-Ser. Ser-2 in this sequence is the site of phosphorylation by protein 
kinase C. Note the mass shift in the spectrum at the fragment corresponding to the phosphorylation site (i.e. m/z 
of ions c’, d’ and e’ are greater by 80.1 Da than corresponding ions c-e in A, due to the mass of the phosphate 


group). 


in serious biological consequences. For example, many 
mutagens cause their cytotoxic effects by forming covalent 
adducts with DNA bases. ESI/MS techniques have been 
extensively applied to the characterization of such DNA- 
chemical adducts (Example 4.8). Other applications of MS 
to the study of DNA include the study of noncovalent bind- 


ing to regulatory DNA sequences and duplex formation in 
oligonucleotides. A particular problem which may be ob- 
served with DNA samples is the presence of large amounts 
of sodium ions complexed to phosphate groups. These can 
be efficiently removed by use of on-line microdialysis prior 
to MS. 
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Example 4.7 Detection of a single water molecule in the transition state complex of cytidine deaminase 


Cytidine deaminase (CDA) removes an amino group from cytidine to form uridine. This is a key target in stopping rapidly 
dividing cells (e.g. bacteria, cancer cells) meeting their requirements for new DNA. The enzyme is a dimer containing 
a single active site per subunit and requires a Zn atom at each site for addition of water to cytidine. Analogues of 
cytidine lacking -NH> bind tightly to CDA and are thought to be transition state analogues. One such analogue, 5-fluoro- 


3, 4-dihydrouridine (FDHU) is formed by addition of water to an inhibitor of the enzyme, 5-fluoropyrimidine-2-one 
ribonucleoside (FZeb), catalyzed by CDA: 


A H B HO. „H 
it 
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HO OH HO OH 
CoH11FN2Os (FZeb) CoH13FN206 (FDHU) 
exact mass 246.07 exact mass 264.08 


It can be predicted that the missing -NH3 group in transition state analogues of CDA might leave a ‘hole’ in the 
complex formed. It is not known if this is filled by something else or is, perhaps, removed by a structural rearrangement 
within the protein. In this experiment CDA exposed to FZeb was subjected to fourier transform ion cyclotron resonance 
MS using electrospray ionization and the mass of the complex accurately confirmed: 


MW so: 31,538.96 Da MW eee? 31,539.02 Da. 


ss 
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31533 31538 31543 31548 mz 


This showed that the mass of the dimer-FZeb complex observed (MWeetex) Was within 2 ppb of that theoretically 
predicted from atomic composition (MW neor) for the enzyme dimer plus two Zn atoms plus two FZeb molecules. The 


very high-resolution mass determination made possible by the fourier transform ion cyclotron detector allowed the 
conclusion that CDA was essentially in its native structure in the gas phase. 


(Continues)_| 
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M (Continued) 


Careful analysis of deconvoluted CDA-FZeb mass spectra revealed three peaks separated by ~264 Da, the mass of 
FDHU, which were not found in spectra of the protein alone: 
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This suggested that CDA had converted FZeb into FDHU (confirming its native status). The data are interpretable in 
terms of a CDA-FDHU complex containing two Zn atoms and two water molecules per CDA dimer. It was concluded 
that a single water molecule occupies the niche vacated by the absent -NH) and thus contributes to stabilization of the 
complex between cytidine deaminase and FDHU. In the absence of FZeb, water is lost to the vacuum and thus does not 
feature in the MS of CDA alone. 

(Reproduced with permission from Borchers et al. (2004), Fourier transform ion cyclotron resonance MS reveals the 
presence of a water molecule in an enzyme-transition state analogue complex. Proceedings of the National Academy of 
Sciences USA 101, 15341-15345) 
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Example 4.8 Characterization of 7-(2-hydroxyethyl) guanine from DNA by ESI/MS 


A covalent adduct is formed between guanine of DNA and ethylene oxide, a commonly-used sterilant in synthetic 


chemistry. 
4 o 
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Guanine Ethylene oxide 7-(2-hydroxyethyl) guanine 


CH,CH,CH 
| 


100 110 120 130 140 150 160 170 180 190 200 210) 220 230 240 250 260 270 280 29% 300 


This adduct is known to be both genotoxic and carcinogenic in man and rodents. DNA was exposed to ethylene oxide 
in vitro and hydrolyzed to release base adducts. The sample was fractionated by C-18 reversed phase HPLC followed by 
ESI/MS. The spectrum obtained is shown below. 

Using this HPLC/MS system, it was possible to detect hydroxyethylation of 1 in 106 DNA nucleotides. This 
type of approach could facilitate screening of exposed individuals and also screening for other types of DNA 
adducts. 
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Chapter 5 


Electrophoresis 


Objectives 


¢ The physical basis of electrophoresis. 
° The range of electrophoresis formats available. 


After completing this chapter you should understand: 


* How electrophoresis can be interfaced with other high-resolution techniques. 
e Applications of electrophoresis to the study of biomolecules such as proteins, peptides and DNA. 


Positive or negative electrical charges are frequently asso- 
ciated with biomolecules. These can be important for struc- 
tural reasons (e.g. ionic interactions leading to the forma- 
tion of salt bridges in proteins) and we have already seen 
how they may be taken advantage of in bioseparation pro- 
cedures (e.g. ion exchange chromatography; Section 2.4.1). 
When placed in an electric field, charged biomolecules move 
towards the electrode of opposite charge due to the phe- 
nomenon of electrostatic attraction. Electrophoresis is the 
separation of charged molecules in an applied electric field. 
The relative mobility of individual molecules depends on 
several factors the most important of which are net charge, 
charge/mass ratio, molecular shape and the temperature, 
porosity and viscosity of the matrix through which the 
molecule migrates. Complex mixtures can be separated to 
very high resolution by this process. A wide variety of elec- 
trophoretic formats are available for analytical, preparative 
and functional experiments in biochemistry. This chapter 
describes the physical basis and principal applications of 
electrophoresis. Electrophoresis applications to proteomics 
are described in Chapter 10. 


5.1 PRINCIPLES OF 
ELECTROPHORESIS 


5.1.1 Physical Basis 


If a mixture of electrically-charged biomolecules is placed 
in an electric field of field strength E, they will freely move 
towards the electrode of opposite charge. However, different 
molecules will move at quite different and individual rates 
depending on the physical characteristics of the molecule 
and on the experimental system used (Figure 5.1). The 


velocity of movement, v, of a charged molecule in an electric 
field depends on variables described by Equation (5.1); 


E-q 
v (5.1) 


where f is the frictional coefficient and q is the net charge 
on the molecule. The frictional coefficient describes fric- 
tional resistance to mobility and depends on a number of 
factors such as the mass of the molecule, its degree of com- 
pactness, buffer viscosity and the porosity of the matrix in 
which the experiment is performed. The net charge is de- 
termined by the number of positive and negative charges in 
the molecule. Charges are conferred on proteins by charged 
amino acid side-chains as well as by groups arising from 
post-translational modifications such as deamidation, acy- 
lation or phosphorylation. DNA has a particularly uniform 
charge-distribution since a phosphate group confers a single 
negative charge per nucleotide. Equation (5.1) means that, 
in general, molecules will move faster as their net charge 
increases, the electric field strengthens and as f decreases 
(which is a function of molecular mass/shape). Molecules 
of similar net charge separate due to differences in frictional 
coefficient while molecules of similar mass/shape may dif- 
fer widely from each other in net charge. Consequently, it is 
often possible to achieve very high resolution separation by 
electrophoresis. 

In practice, an electric field is established by applying 
a voltage, V, to a pair of electrodes separated by a dis- 
tance, d (Figure 5.1). This results in the electrical field of 
strength E; 


(5.2) 
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Starting point 


V 


Figure 5.1. Physical basis of electrophoresis. Molecules move in 
an electric field of field strength E depending on their net charge, 
molecular mass and shape. Note that smaller and more charged 
molecules move more rapidly towards the electrode of opposite 
charge and that molecules of different shape but identical net charge 
have different electrophoretic mobilities. 


The bulk of the current is carried between the electrodes 
by charged components of the buffer which also perform 
the function of maintaining a constant pH. This is especially 
important in electrophoresis since pH changes during the 
experiment would result in altered net charge and hence 
altered mobility. The most commonly-used buffer systems in 
electrophoresis of biomolecules are Tris-Cl or Tris-glycine. 
Buffers are held in reservoirs connected to each electrode 
and provide a constant supply of ions to the electrophoresis 
system throughout the separation. 

Ohm’s law relates V to current, Z, by electrical resistance, 
R, as follows; 


V=R-1 (5.3) 


We might predict that increasing V would result in much 
faster migration of molecules due to greater current. How- 
ever, experimentally we find that large voltages result in 
significant power generation which is mainly dissipated in 
the form of heat. The power (in Watts) generated during 
electrophoresis is given by Equation (5.4). 


W=I-R (5.4) 


Heat generation is undesirable because it leads to loss of 
resolution (due to convection of buffer causing mixing of 
separated samples), a decrease of buffer viscosity (which 
also decreases R) and, in extreme cases, structural break- 
down of thermally-labile samples such as proteins and nu- 
cleic acids. A decrease in R means that, under conditions 
of constant voltage, I will increase during electrophoresis 


which, in turn, leads to further heat generation (Equation 
(5.3)). In practice, constant voltage conditions are used in 
most electrophoresis experiments but, for certain applica- 
tions, a constant power supply may be used. A constant 
power supply allows V to change during the experiment to 
maintain a constant value of W (Equation (5.4)). In general, 
electrical conditions are selected which are adequate to sep- 
arate samples in a reasonable time-frame but which will not 
cause extensive heating. 

Because the electrical field strength, E, may vary widely 
between different experimental formats, the electrophoretic 
mobility, 1, of a sample is defined by Equation (5.5); 


=+ (5.5) 
H=% . 
Combining this with Equation (5.1) shows that; 
E-q_4 
SSS SS (5.6) 
Ef f 


That is, biomolecules migrate based on the ratio of net 
charge to frictional coefficient. Since f is strongly mass- 
dependent for classes of biopolymers of similar shape (e.g. 
globular proteins; linear DNA), differences in u approxi- 
mate closely to differences in charge/mass ratio. 

This description is adequate to understand how net charge, 
mass and shape underlie the separation of molecules in elec- 
trophoresis. It should be noted, however, that it is an incom- 
plete description since it does not include relevant factors 
such as possible interaction of the sample molecules with 
the support medium (e.g. gels; see next section), charge sup- 
pression on the surface of large biomolecules or differences 
due to the buffer composition used in the experiment. For 
this reason it is important to understand that electrophore- 
sis, as used in most situations in biochemistry, is largely 
an empirical technique. High resolution mobility data can 
be obtained for biomolecules by comparison with standard 
molecules of similar charge-density and shape and such 
measurements have proved useful in all areas of biochem- 
istry. However, it is not usually possible to make direct mea- 
surements (as compared to comparative measurements) of 
molecular mass or shape from electrophoretic mobility alone 
due to lack of detailed information on variables involved in 
the process. While most protein and nucleic acid molecules 
behave predictably in electrophoresis, there are many exam- 
ples of molecules of different charge/mass comigrating or 
of similar charge/mass separating due to differences in their 
electrophoretic mobility. 


5.1.2 Historical Development of Electrophoresis 


Early electrophoresis experiments were performed in 
free solutions of aqueous buffers. In 1937, the Swedish 


Electrode 


t | 


Protein _ 
solution 


Current Current 
Figure 5.2. Moving boundary apparatus of Tiselius. When cur- 
rent is turned on, a solution of negatively charged protein moves 
towards the anode. Interfaces between the electrode buffer and pro- 
tein solution form ascending and descending boundaries as shown 


by arrows. The rate of migration of these ‘moving boundaries’ may 
be measured. 


scientist Arne Tiselius developed an apparatus which could 
be used to measure u values of proteins (see Figure 5.2). In 
this system, proteins moved towards the electrode of oppo- 
site charge as an ascending and descending boundary in a 
U-shaped tank. While this could separate proteins into 
moving boundaries allowing mobility measurements to be 
made, the resolution of the system was extremely low, 
large volumes of protein were required and the apparatus 
used was quite cumbersome. Zone electrophoresis was then 
introduced which involved separation of biomolecules in 
some form of solid support. Examples of these supports 
include paper, cellulose and gel networks. These supports 
have considerable mechanical strength and allow staining of 
biomolecules after separation. Most modern electrophoretic 
separations take place in zonal gel electrophoresis systems. 
More recently, it has been found that electrophoresis in very 
thin capillaries allows the use of higher than normal voltages 
with efficient heat dissipation giving very high resolution. 
Capillary electrophoresis is a versatile analytical technique 
capable of separating a wide range of biological samples 
such as biomolecules, viruses and even whole cells (Section 
5.10). A wide range of zonal and free solution systems are 
possible in capillary electrophoresis making an interesting 
historical link with the pioneering work of Tiselius. 


5.1.3 Gel Electrophoresis 


Hydrated gel networks have many desirable properties for 
electrophoresis. They allow a wide variety of mechanically- 
stable experimental formats such as horizontal/vertical elec- 
trophoresis in slab gels or electrophoresis in tubes or 
capillaries. Their mechanical stability also facilitates post- 
electrophoretic manipulation making further experimenta- 
tion possible such as blotting, electroelution or MS iden- 
tification/fingerprinting of intact proteins or of proteins 
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digested in gel slices (Chapters 4 and 10). Since gels used 
in biochemistry are chemically rather unreactive, they in- 
teract minimally with biomolecules during electrophoresis 
allowing separation based on physical rather than chemical 
differences between sample components. Highly-controlled 
procedures are available for the formation of gels of a nar- 
row range of porosity. This means that an individual gel 
can be prepared containing pores which will allow only 
molecules of a defined maximum mass to pass through, 
while excluding larger masses. Increasing or decreasing the 
size of these pores alters the mass which can be selected by 
the gel. Where the electrophoretic mobility of a biopolymer 
is directly related to gel porosity such gels are called restric- 
tive gels. Most hydrated polymers used in electrophoresis 
applications in biochemistry also have good anticonvection 
properties which enhance resolution. 

Gel networks may be formed from polymers which are 
crosslinked or noncrosslinked (the latter have proved par- 
ticularly useful in capillary electrophoresis; Section 5.10). 
Carbohydrate polymeric materials such as starch were his- 
torically used to form gel networks for the separation of 
proteins (Figure 5.3). There is extensive hydrogen bonding 
in these polymers and, when boiled and then cooled, sus- 
pensions will ‘set’ into a gel similar to those used widely in 
food systems. The gel network arises from a rearrangement 
of the hydrogen bonding pattern to form extensive inter- 
chain hydrogen bonds. It is difficult to control this process 
accurately, however, to ensure uniformity of porosity. More- 
over, the presence of carboxyl, sulfate and other chemical 
groups in starch can cause electroosmosis (Section 5.10) and 
ionic interactions. In starch gel electrophoresis of proteins, 
these effects result in low-resolution electrophoresis with 
broad bands. The most widely-used polysaccharide gel ma- 
trix nowadays is that formed with agarose. This is a polymer 
composed of a repeating disaccharide unit called agarobiose 
which consists of galactose and 3, 6-anhydrogalactose (Fig- 
ure 5.3). Agarose gives a more uniform degree of porosity 
than starch and this may be varied by altering the starting 
concentration of the suspension (low concentrations give 
large pores while high concentrations give smaller pores). 
This gel has found widespread use especially in the sepa- 
ration of DNA molecules (although it may also be used in 
some electrophoretic procedures involving protein samples 
such as immunoelectrophoresis;, Section 5.7). Because of 
the uniform charge-distribution in nucleic acids, it is possi- 
ble accurately to determine DNA molecular masses based 
on mobility in agarose gels. This application is described in 
more detail in Section 5.8. The limited mechanical stability 
of agarose, while sufficient to form a stable horizontal gel 
compromises the possibilities for postelectrophoretic ma- 
nipulation, however. 

A far stronger gel suitable for electrophoretic separation 
of both proteins and nucleic acids may be formed by the 
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Figure 5.3. Gels commonly used in electrophoresis of proteins and nucleic acids, (a) Polysaccharide gels are formed by boiling followed 
by cooling. Rearrangement of hydrogen bonds gives interchain crosslinking. (b) Agarose is composed of agarobiose. (c) Polymerisation of 
acrylamide to form polyacrylamide gel. The polymerisation reaction is initiated by persulphate radicals and catalysed by TEMED. 


polymerization of acrylamide (Figure 5.3). The inclusion of 
a small amount of acrylamide crosslinked by a methylene 
bridge (N,N’ methylene bisacrylamide) allows formation of 
a crosslinked gel with a highly-controlled porosity which 
is also mechanically strong and chemically inert. For sepa- 
ration of proteins, the ratio of acrylamide: N,N’ methylene 
bisacrylamide is usually 40: 1 while for DNA separation it 
is 19:1. Such gels are suitable for high resolution separa- 
tion of DNA and proteins across a large mass range (Table 
5.1). A wider range of molecular mass in an individual gel 
is achievable by the use of gradient gels. These consist of a 
gradient of polyacrylamide (e.g. 5—20%) which is prepared 
as described in Figure 5.4. 

Once sample molecules have separated in the gel matrix 
it is necessary to visualize their position. This is achieved by 


staining with an agent appropriate for the sample. Some of 
the more common staining methods used in biochemistry are 
listed in Table 5.2. DNA molecules are easily visualized un- 
der an ultraviolet lamp when electrophoresed in the presence 
of the extrinsic fluor ethidium bromide (Table 3.4). Alter- 
natively, nucleic acids can be stained after electrophoretic 
separation by soaking the gel in a solution of ethidium bro- 
mide. When intercalated into double-stranded DNA, fluo- 
rescence of this molecule increases greatly (Chapter 3; this 
molecule also stains single-stranded nucleic acids albeit not 
so strongly). This is also the basis of the undesirable mu- 
tagenesis and carcinogenesis properties of ethidium which 
makes caution necessary when using it in electrophoretic 
staining. It is also possible to detect DNA with the ex- 
trinsic fluor /-anilino 8-naphthalene sulphonate (ANS; 


Table 5.1. Range of separation of DNA and proteins in polyacry- 
lamide gels of different polyacrylamide concentration 


Range of separation 


Acrylamide concentration 


% (wiv) DNA (bp) Protein (kDa) 
3.5 1000-2000 — 

5 80-500 >1000 

8 60—400 300-1000 
12 40-200 50-300 

15 20-150 10-80 

20 5-100 5-30 


Note: Proteins separated in the presence of SDS (see Section 5.3.1). 


Table 3.4). Protein is usually stained with the dye coomassie 
blue. Although most proteins stain with this dye, it should be 
noted that not all proteins take up the dye with equal affinity 
and care should be taken when trying to obtain quantitative 
information from such stains. Less sensitive protein dyes 
include ponceau red and amido black. Ponceau red has the 
advantage that it stains reversibly and may be removed from 
the protein to allow subsequent analysis (e.g. immunostain- 
ing). The most sensitive staining method for protein is silver 
staining. This involves soaking the gel in AgNO; which re- 
sults in precipitation of metallic silver (Ag®) at the location 
of protein or DNA forming a black deposit in a process sim- 
ilar to that used in black-and-white photography. Because 
of the multiple washing steps involved, however, this stain- 
ing method is much more tedious than dye staining and is 
normally used to detect protein bands not visible by other 
methods. Radioactive proteins/nucleic acids can be visu- 
alized after electrophoretic separation by autoradiography. 


5% 20% 


20% 


Figure 5.4. Gradient gel electrophoresis. A polyacrylamide gra- 
dient gel is prepared by mixing solutions of the two percentages 
required as shown. The 20% solution contains glycerol which sta- 
bilises the gradient until polymerisation is complete. These gels are 
used to achieve separation of proteins across a wide molecular mass 
range. 
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Table 5.2. Commonly used stains for biopolymers after elec- 
trophoretic separation in agarose or polyacrylamide gels 


Detection 
Stain Use limit’ (ng) 
Amido black Proteins 400 
Coomassie blue Proteins 200 
Ponceau red Proteins 200 
(reversible) 
Bis-1-anilino-8-Naphthalene Proteins 150 
sulphonate (ANS) 
Nile Red Proteins 20 
(reversible) 
SYPRO orange Proteins 10 
Fluorescamine (protein Proteins 1 
treated prior to 
electrophoresis) 
Silver chloride Proteins/DN 1 
SYPRO red Proteins 0.5 
Ethidium bromide DNA/RNA 10 


Note: Some of these stains may also be applied to molecules after 
transfer to nitrocellulose or other membranes (i.e. blotting; see 
Section 5.11). 

“These limits of detection should be regarded as approximate since 
individual proteins may stain more or less intensely than average. 


The dried gel is clamped on a piece of X-ray film and placed 
at —70°C (to stop band diffusion). On developing the film, 
bands become visible. 

Despite the possibility that different sample components 
may stain to different extents, it is often possible to obtain 
quantitative information from electrophoresis gels. If the 
bands are labelled with radioisotopes, they may be cut out 
and counted in a scintillation counter. Alternatively, a va- 
riety of image analysis procedures are now possible with 
the advent of computers equipped with a scanner or video 
camera and appropriate image analysis software. These al- 
low analysis of the gel and recording of results as a digital 
signal. A densitometer (a type of spectrophotometer; Chap- 
ter 3) may be used to measure absorbance at a particular 
wavelength along each track of the gel. This generates a 
trace similar to a HPLC chromatogram in which peaks of 
absorbance represent bands of protein/DNA in the gel. This 
process is called densitometry and allows comparison of one 
track with another and of one gel with another. 

While electrophoresis at continuous pH is possible, im- 
proved resolution in polyacrylamide gel electrophoresis is 
obtained in a discontinuous system consisting of two gels 
called the stacking gel and resolving gel which are held at 
pH 6.9 and 8-9, respectively (Figure 5.5). The purpose of 
the stacking gel (sometimes also called spacer gel) is con- 
centration of sample into thin bands or ‘stacks’ which can 
accumulate at the interface between the two gels prior to 
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Figure 5.5. Discontinuous polyacrylamide gel electrophoresis. Sample is introduced into small (e.g. 50 ul) ‘wells’ in the stacking gel which 
has a different pH, ion strength and polyacrylamide concentration of the resolving gel. When current is turned on, protein/nucleic acids 
‘stack’ by a process of ‘isotachophoresis’ as they pass through the stacking gel (see Section 5.3.1). At the interface between the stacking 
and resolving gels, a process of destacking begins (i.e. separation of individual protein/DNA molecules). Sample components continue to 
separate as they pass through the resolving gel due to differences in mass/charge, shape, etc. Adaptation of Figure 2 from Cruz et al. (1998) 
Biochemica et Biophysica Acta, 1415, 125-134, with permission from Elsevier Science. 


separation. This gel has a low concentration of polyacry- 
lamide (3-5%), low ionic strength and a pH near neutrality. 
The resolving gel, by contrast, is composed of a higher 
percentage of acrylamide (8—20%), higher ionic strength 
and an alkaline pH. This gel achieves separation of sample 
molecules stacked at the interface. The lower ionic strength 
of the stacking gel causes higher electrical resistance and 
hence stronger electrical field strength in this gel compared 
to the resolving gel. This means that, at a given voltage, 
samples have higher mobility in the stacking compared to 
the resolving gel (Equation (5.1)). 

Sample is applied to the stacking gel in the electrophore- 
sis or running buffer composed of Tris-glycine at pH 8-9. 
Glycine exists in the form of a mixture of anion and zwitte- 
rion at this pH; 


COO +H* 
(5.7) 


NHt — CH, — COO7 


Zwitterion form 


> NH? CH, 


Anion form 


As the aliquot of sample enters the stacking gel at pH 
6.9, the balance of equilibrium shifts strongly towards the 
uncharged zwitterion form which has no electrophoretic 


mobility. As mentioned above, buffer ions are required to 
carry current in electrophoresis systems (Section 5.1.1). 
To maintain a constant current, a flow of anions is nec- 
essary. At pH 6.9, most proteins and nucleic acids are an- 
ionic and these, together with chloride, replace glycinate 
as mobile ions. In practice, protein/nucleic acid ions be- 
come ‘sandwiched’ between chloride (high mobility) and a 
small amount of glycinate ions (low mobility) as they move 
quickly through the stacking gel, unimpeded by the large 
pores due to low polyacrylamide concentration. This phe- 
nomenon of ‘sandwiching’ between ions of different elec- 
trophoretic mobility is called isotachophoresis (for a more 
detailed description of this see Section 5.10.1). Chlorine is 
a very strongly electronegative atom which moves towards 
the anode with much greater velocity than any other species 
present. This band of anions leaves behind it a zone of low 
conductivity. As this zone passes through the rest of the 
sample, molecules in the sample become sorted on the ba- 
sis of their charge from most to least negatively charged. 
Simultaneously, they are concentrated based on this charge 
discrimination. This stacks the sample into a number of 
thin layers. When this front of ions reaches the interface 


with the resolving gel (pH 8-9), however, the glycinate 
ion concentration increases dramatically (Equation (5.7)) 
and now carries the bulk of the current. At the same time, 
protein/nucleic acid ions encounter a higher concentration of 
polyacrylamide with a narrow pore size and a more alkaline 
pH. The thin stacks of protein/nucleic acid therefore sepa- 
rate in this gel depending on their mass/charge and shape 
characteristics as described in Section 5.1.1. 

In most of the applications described in this chapter, elec- 
trophoresis is used as a high-resolution analytical proce- 
dure. However, it is also possible to recover protein, peptide 
or DNA samples from gels after electrophoretic separation. 
This preparative procedure usually involves cutting out a 
slice of gel containing a single molecular species and re- 
covering the protein/DNA by exposing the gel-slice to an 
electrical field. This process is called electroelution and it is 
suitable for the purification of small quantities of material 
for further analysis. This approach is especially appropriate 
for samples difficult to purify by conventional chromato- 
graphic methods (Chapter 2). 


5.2 NONDENATURING 
ELECTROPHORESIS 


5.2.1 Polyacrylamide Nondenaturing Electrophoresis 


The gel network formed by polyacrylamide is a suitable 
environment for the electrophoretic separation of proteins 
in their native state, that is under nondenaturing condi- 
tions. In such conditions, the protein is regarded as being in 
its intact fully-active form with the correct secondary, ter- 
tiary and quaternary structure. It separates on the basis of 
intrinsic charges on amino acid side-chains and other groups 
located on the protein surface. Since the precise number and 
strength of positive and negative charges will vary from pro- 
tein to protein, each will have a characteristic mobility in a 
nondenaturing system determined by a combination of these 
charges together with physical characteristics of the protein 
such as mass and shape. 


5.2.2 Protein Mass Determination by Nondenaturing 
Electrophoresis 


Non-denaturing electrophoresis allows the determination 
of protein native molecular mass (M,). Since mobility is 
strongly affected by mass and shape, it is possible to com- 
pare the mobility of a protein of unknown mass with a series 
of standard proteins of similar shape but known mass. This 
mobility is measured in a number of nondenaturing gels each 
of differing polyacrylamide concentration called Ferguson 
gels (Example 5.1). The mobility of each protein (standards 
and unknown) is measured and expressed as Ry. A plot of 
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log Rr for each standard protein versus % polyacrylamide is 
then made (it is necessary to use a minimum of five differ- 
ent polyacrylamide concentrations for accurate results). The 
slope of these plots is related to the retardation coefficient, 
K,, for that protein as; 


K, = —(slope) (5.8) 


There is a direct relationship between K, and molecular 
mass of molecules of similar general shape. In the case of 
globular proteins, there is a linear relationship between K, 
and the radius of the globule shape, r (Stokes radius). By 
plotting K, versus M,, a standard curve may be constructed 
from which it is possible to estimate native molecular mass. 
This data complements mass estimates from techniques such 
as gel filtration (Chapter 2) and is especially useful when 
only small amounts of protein are available. By combining 
such measurements with M, estimates under denaturing con- 
ditions (Section 5.3), the quaternary structure of a protein 
can be routinely determined. 


5.2.3 Activity Staining 


Since proteins are separated by nondenaturing electrophore- 
sis in their native form, they generally retain biological ac- 
tivity. In the particular case of enzymes, catalytic activity 
can be used as a specific stain for the enzyme in a process 
called activity staining. This involves use of an artificial 
substrate which is converted to an insoluble, coloured prod- 
uct by the catalytic cycle. The location of the protein in 
the gel can be visualized by observation of the insoluble 
(and therefore precipitated) product. Since a single enzyme 
may go through many thousands of catalytic cycles in a few 
minutes, very tiny amounts of enzyme can be visualized in 
this way. 

Although in principle an individual activity stain is re- 
quired for each enzyme, in practice particular activity stains 
are available for groups of enzymes sharing a common cat- 
alytic process. For example, many dehydrogenases catalyze 
formation of NADH from NAD. This can be used to reduce 
tetrazolium to produce insoluble formazan (Figure 5.6). If, 
after nondenaturing electrophoresis, a gel is soaked with 
tetrazolium together with appropriate substrates and cofac- 
tors for a particular dehydrogenase, formazan will only form 
where the enzyme is located in the gel. By selecting sub- 
strates specific for the particular dehydrogenase under inves- 
tigation (e.g. succinate for succinate dehydrogenase, malate 
for malate dehydrogenase) highly individual staining for 
dehydrogenases can be achieved using essentially the same 
staining procedure. 

This type of experiment is useful in identifying which 
band among several visible on a nondenaturing gel is 
the band representing the enzyme of interest. It is also 
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Example 5.1 Determination of protein native mass by nondenaturing electrophoresis 


When working with a newly-purified protein, we may need to know its quaternary structure (i.e. is it a monomer, dimer, 
etc.?). Non-denaturing electrophoresis is particularly useful for this determination in situations where availability of 
protein is limited (<500 ug). In restrictive polyacrylamide gels, the electrophoretic mobility of individual proteins (Rf) 
will vary in a straight-line manner with respect to the porosity of the gel (i.e. % acrylamide). 

For a protein of unknown molecular mass (subunit molecular mass; 50 kDa) and some standard proteins of known 
mass, Rf values were determined in a nondenaturing system using gels of 5, 7.5, 10, 12.5 and 15% polyacrylamide . The 
standard proteins were a-lactalbumin (14.2 kDa), carbonic anhydrase (29 kDa), ovalbumin (45 kDa) and urease (trimeric 
form; 272 kDa). A Ferguson plot of log Rf versus % acrylamide was prepared as follows; 


oc 

2 a@-Lactalbumin 

x 

e Carbonic anhydrase 
= Ovalbumin 


Unknown sample 


Urease (trimer) 


2.5 5 7.5 10 12.5 15 17.5 
% acrylamide 


The Ferguson plot demonstrates the linear relationship between Rf and gel porosity mentioned above. 

A slope value was calculated for each protein from which the retardation coefficient (K,) is determined as — (slope). 
There is a direct relationship between K, and the molecular radius (Stoke’s radius) of the protein which allows us to 
calculate molecular mass (M,). Since we know the K, of a number of standard proteins (of known M,) as well as of 
the unknown sample, a standard curve of K, versus molecular mass can be plotted. From this plot, the native mass of 
the protein was measured as 108 kDa. Since the subunit M, is 50 kDa, it could be deduced from this experiment that the 
unknown protein was a dimer. 


Standard curve of retardation coefficients from Ferguson Plot. 


Unknown sample 


0 50 100 150 200 250 300 


(DATA COURTESY OF DR. VIVIENNE FOLEY). 
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Figure 5.6. Activity staining for dehydrogenases. MTT is a tetrazolium salt which may be reduced in a reaction catalysed by dehydrogenases 
to an insoluble formazan product which has two resonance hybrid forms stabilised by an internal hydrogen bond (dashed line). 


useful in demonstrating the presence of multiple enzymes 
catalyzing a particular chemical reaction called isoenzymes 
(Example 5.2). 


5.2.4 Zymograms 


Many enzyme activities are expressed as isoenzymes and 
there is often a characteristic pattern of expression in 
different tissues, individuals, populations or in samples 
taken at different stages of development. Examples of the 
use of these patterns include diagnosing the onset of cancer 
in cultured cells and the mixing of genetically-distinct 
biological populations such as farmed and wild salmon. 
Activity-stained nondenaturing gels are called zymograms 
and these may be used to characterize cell-types, tissues, 
individuals and populations on the basis of isoenzyme 
expression (Example 5.2). 


5.3 DENATURING ELECTROPHORESIS 


Biopolymers such as proteins and nucleic acids are folded 
into compact structures held together by a variety of non- 
covalent, ionic interactions such as hydrogen bonding and 
salt bridges. We have seen that, under nondenaturing condi- 
tions, such samples migrate in a manner determined largely 


by their intrinsic electrical charge. However, it is possible to 
disrupt these ionic interactions to denature the biopolymers 
and then to separate them electrophoretically by denaturing 
electrophoresis. Such experiments are especially useful in 
the study of proteins since they have a more varied range 
of tertiary structure than nucleic acids. The electrophoretic 
mobility of the denatured molecule will be changed, com- 
pared to that in nondenaturing conditions, due to altered 
charge and/or shape since it now migrates as an unstruc- 
tured monomer through the electrical field. Any biological 
activity or quaternary structure associated with the sample 
components is lost in denaturing electrophoresis. However, 
important structural information can be obtained especially 
about proteins as described in the following sections. 


5.3.1 SDS Polyacrylamide Gel Electrophoresis 


The detergent sodium dodecyl sulphate (SDS) consists of a 
hydrophobic 12-carbon chain and a polar sulfated head. The 
hydrophobic chain can intercalate into hydrophobic parts 
of the protein by detergent action, disrupting its compact 
folded structure (Figure 5.7). The sulfated part of the de- 
tergent remains in contact with water thus maintaining the 
detergent-protein complex in a highly-soluble state. This in- 
teraction disrupts the folded structure of single polypeptides 
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Example 5.2 Use of nondenaturing electrophoresis for zymogram analysis 


Proteins retain their intact native structure when electrophoresis is performed under nondenaturing conditions. This means 
that enzymes are usually fully active even after electrophoretic separation. Frequently, enzymes are present in multiple 
forms or isoenzymes in cells. These proteins may be variants arising from a range of post-translational modifications 
(e.g. glycosylation) or may be distinct products of specific genes. After electrophoretic separation, instead of staining the 
gel with nonspecific dyes such as coomassie blue, we can often activity stain with enzyme substrates which are specific 
for the group of enzymes under investigation. This is called zymogram analysis and it facilitates analysis of individual 
isoenzymes in cell extracts without the need for their prior purification. Isoenzyme expression patterns from such studies 
can give us useful comparative information on the genetic composition of natural populations of animals, differences 
between species, tissue-specific enzyme expression within a single individual and localized expression within a single 
tissue. 

A good example of an enzyme present in multiple forms in cells is lactate dehydrogenase (LDH) which catalyses 
the reduction of pyruvate under anaerobic conditions. This tetrameric enzyme is composed of two types of subunit 
polypeptides designated H and M. LDH from heart and muscle are composed of four H and M subunits, respectively, but 
five isoenzymes are possible in vertebrate liver (H4, HM3, H2M>2, HM3, M4). These isoenzymes are designated LDH-1 
to LDH-5, respectively, and are separable by electrophoresis in polyacrylamide gels. Specific activity staining of these 
enzymes is possible by the following method; 


OH (0) 
| II 
COO -CH - CH, Coo“ -C - CH, 
Lactic acid Pyruvate 
NAD+ NADH + H* 


aZ 


Nitro blue 


Formazan 
(insoluble) 


tetrazolium 
chloride 


Wherever LDH isoenzymes are located in the gel, they will produce the coloured formazan product (this type of 
reaction is widely-used in activity staining of dehydrogenases generally). Since this compound is insoluble, it will not 
diffuse away from the location to which the isoenzyme has electrophoresed and this is an important feature of most 
activity stains. The pattern of isoenzyme expression determined for liver samples from pig (1), cattle (2), rabbit (3), rat 
(4) and human (5) is shown below; 


Using this method with ng-sized microdissected samples, it was possible to identify differential expression of isoen- 
zymes in the basic functional unit of liver, the acinus. The level of the M-subunit was found to be consistently higher in the 
periportal part of the acinus than in the perivenous part in all these species while the H-subunit showed species-specific 
variation. These results are important in understanding the contribution of LDH to anaerobic metabolism in this vital 
tissue. 

Reproduced from Maly, I.P. and Toranelli, M. (1993), Ultrathin-layer zone electrophoresis of lactate dehydrogenase 
isoenzymes in microdissected liver samples. Analytical Biochemistry, 214, 379-388. Copyright © 1993, Academic 
Press. 
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Figure 5.7. Interaction of protein with sodium dodecyl sulphate, 
(a) Sodium dodecyl sulphate has a hydrophilic head and a hydropho- 
bic tail, (b) The detergent interacts with folded protein by coating 
hydrophobic regions of the polypeptide conferring negative charges 
on them. This disrupts both subunit—subunit and protein—membrane 
interaction. 


as well as subunit-subunit (i.e. quaternary structure) and 
protein-membrane interactions for most proteins. SDS coats 
proteins with a uniform ‘layer’ of negative charges which 
causes them to migrate towards the anode when placed in an 
electrical field, regardless of the net intrinsic charge of the 
uncomplexed proteins. The negative charge gives a charge- 
density largely independent of the primary structure or mass 
of the polypeptide. For this reason, there is a close rela- 
tionship between the mobility of SDS-protein complexes in 
polyacrylamide gels and the molecular mass of the polypep- 
tide (Figure 5.8). This type of experiment is called SDS poly- 
acrylamide gel electrophoresis or SDS PAGE and it is used 
widely to obtain mass and purity estimates for polypeptides. 
A further application of this technique is to visualize pep- 
tide maps arising from proteolytic or chemical digestion of 
a pure sample. This latter application complements HPLC 
(Chapter 2) and mass spectrometry (Chapter 4) analyses. 
Despite its widespread use and versatility, SDS PAGE 
has a number of important limitations. It is assumed that 
proteins electrophorese as perfect spheres with a uniform 
charge-distribution. If proteins deviate from a globular shape 
or if they bind above- or below-average amounts of SDS, 
they may behave in a nonideal way and inaccurate mass 
estimates will result. For this reason, it is important to 
compare unknown proteins to appropriate standard pro- 
teins in SDS PAGE (i.e. globular unknowns with globu- 
lar standards and fibrous unknowns with fibrous standards). 
Mass estimates obtained by this technique are often referred 
to as the apparent molecular mass since they depend on 
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Figure 5.8. SDS PAGE of a protein of unknown molecular mass. 
Standard proteins of known molecular mass (M,) were applied 
to a 15% SDS PAGE gel (track 1) as follows; bovine serum al- 
bumin (68 kDa), ovalbumin (45 kDa), glyceraldehyde 3-phosphate 
dehydrogenase (36 kDa), carbonic anhydrasc (29 kDa), trypsinogen 
(24 kDa), soya bean trypsin inhibitor (20.1 kDa) and @-lactalbumin 
(14.2 kDa). A sample protein of unknown mass was simultaneously 
applied to track 2 and, after electrophoresis (3 h at 175 V), the re- 
solving gel was stained with coomassie blue. This visualised both 
standard and sample proteins. Bromophcnol blue was included as a 
‘tracker dye’ which shows the front of clcctrophorcsis. The distance 
each protein migrates into the gel is measured and M, of sample 
protein may be determined from a plot of this distance versus log 
M, of standards. 


comparison with other proteins rather than on direct mea- 
surement such as determination of complete amino acid se- 
quence. Possible effects of post-translational modifications 
on protein mobility in SDS PAGE gels should also be con- 
sidered. For example, glycoproteins migrate slower in such 
systems because SDS does not bind to sugar moieties. 


5.3.2 SDS Polyacrylamide Gel Electrophoresis in 
Reducing Conditions 


A number of variants of SDS PAGE are possible by appro- 
priate chemical pretreatment of the protein sample. Many 
proteins, especially those secreted outside the cell such as 
immunoglobulins, contain disulphide bridges as a distinct 
structural feature. They are formed post-translationally be- 
tween cysteine side-chains, adding greatly to the stability 
of the protein. In order to understand the complete covalent 
structure of a protein, it is necessary to know the num- 
ber and location of these disulphide bridges. By pretreat- 
ing protein samples with chemical reducing agents such 
as 2-mercaptoethanol or dithiothreitol, it is possible to re- 
duce disulphides, breaking the links between individual 
polypeptides (Example 5.3). In the presence of SDS, such 
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Example 5.3 SDS PAGE in reducing conditions 


Disulphide bridges are formed between cysteine side-chain sulphydryl groups either within a single polypeptide (intra- 
chain) or between different polypeptides (interchain) in oligomeric proteins. They are important structural features of 
proteins because they frequently maintain the protein in a particular shape and usually add to its stability. Disulphide 
bridges can be reduced to sulphydryls by treatment with reducing agents such as 2-mercaptoethanol or dithiothreitol. In 
order to understand fully the covalent structure of a protein, we need to identify its pattern of disulphide bridging. 

A 100kDa protein is a trimer composed of two molecules of 25 kDa and a single polypeptide of 50 kDa, giving an 
apparent native M, estimate of 100 kDa. Treatment with B-mercaptoethanol disrupts disulphide bridges as shown below. 
SDS PAGE analysis of this protein in the presence (+) and absence (—) of 2-mercaptoethanol reveals a single pattern of 
disulphide bridging in this protein. 


SDS PAGE analysis of protein under reducing conditions 
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polypeptides then migrate individually in an electrophoresis A related and complementary nondenaturing elec- 
system. Analysis of mass estimates in the presence and ab- trophoretic technique allows counting of integral numbers 
sence of reducing agents allows deductions about disulphide of cysteines in proteins by pretreatment with mixtures 


bridging in sample proteins to be made. of iodoacetic acid and iodoacetamide (Figure 5.9). This 
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Figure 5.9. Counting protein sulphydryl (-SH) groups by non-denaturing electrophoresis. (a) -SH groups react with both iodoacetic acid 
(giving a negative charge) or iodoacetamide (uncharged). Negatively charged proteins migrate towards the anode, (b) Treatment of sample 
protein (containing three -SH groups) with an increasing ratio of iodoacetic acid/iodoacetamide (1-4), gives a series of differently charged 
proteins with zero to three negative charges. ‘Mix’ contains a mixture of aliquots from tracks 1-4. Counting of bands in this track gives the 


number of -SH groups in the protein. 


information can be useful as a first step in revealing the 
disulphide distribution pattern within a protein. 


5.3.3 Chemical Crosslinking of 
Proteins — Quaternary Structure 


Quaternary structure describes the arrangement of individ- 
ual polypeptide chains in oligomeric proteins (i.e. proteins 
consisting of more than one polypeptide). We have already 
seen that secreted proteins such as immunoglobulins of- 
ten consist of several polypeptide chains or subunits linked 


NH) 


TEE 


together by disulphide bridges. Other oligomeric proteins 
are held together by noncovalent interactions, especially 
hydrophobic forces. Such structures are often quite unstable 
and may be difficult to study by conventional methods such 
as X-ray crystallography (Chapter 6). One experimental ap- 
proach to this problem takes advantage of the fact that the 
amino acid lysine, which has a chemically-reactive e-NH2 
group, is quite abundant in proteins (an average of ~7% of 
total amino acid residues). This group will react readily with 
bifunctional reagents such as dimethyl suberimidate (Figure 
5.10). Polypeptides adjacent in the quaternary structure of a 


C-(CH,),-C 
+ ——— ie XN 
NH NH 
NH,* C NH, CI \ 


i i 
CH;0 - C — (CH3) — C - OCH; 


Dimethyl suberimidate 


Crosslinked polypeptides 


Figure 5.10. Cross-linking of polypeptides with bifunctional reagents. Lysine e-NH2 groups of adjacent polypeptides may be crosslinked 
via a bifunctional reagent such as dimethyl suberimidate. Crosslinked proteins will electrophorese in SDS PAGE to an apparent mass equal 


to that of the sum of the crosslinked polypeptides. 
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Figure 5.11. Urea as a protein denaturant. (a) Urea is a net uncharged molecule which is nonetheless polar in nature with positive and 
negative dipoles as shown (6). (b) At high concentrations, urea can interrupt hydrogen bonding between amide and carbonyl groups of 


proteins. This leads to unfolding of the protein. 


large protein or complex will be covalently crosslinked by 
this reaction and will electrophorese as a single molecule in 
SDS PAGE. 


5.3.4 Urea Electrophoresis 


One of the great disadvantages of SDS as a denaturing 
agent is the fact that it binds tightly to protein and can 
be very difficult to remove after electrophoresis. An alter- 
native denaturation strategy is offered by the chaotropic 
agent, urea. We have previously seen (Chapter 2) how this 
molecule may be used in ion exchange chromatography of 
peptides. Urea (Figure 5.11) has the useful property that, 
although it possesses zero net charge, it is nonetheless a 
polar molecule with unequal internal charge distribution. 


Protein secondary structure depends on hydrogen bonding 
between the amide and carbonyl groups of peptide bonds. 
Similar hydrogen bonds are also involved in stabilization 
of tertiary and quaternary structure. High concentrations 
of urea interrupt these hydrogen bonds, leading to a com- 
plete disruption of secondary, tertiary and quaternary struc- 
ture. Urea renders polypeptides highly water-soluble but, 
because it is uncharged, it does not migrate in electrical 
fields. Proteins therefore electrophorese in the presence of 
urea as determined by their net intrinsic charge, despite 
the fact that they are denatured in these conditions. Urea 
also features in DNA electrophoresis in situations where 
single strands are required. As with proteins, urea dis- 
rupts the hydrogen bonding responsible for DNA secondary 
structure. 


5.4 ELECTROPHORESIS IN 
DNA SEQUENCING 


DNA is along rod-like molecule which moves through poly- 
acrylamide gels with limited mobility. Moreover, DNA has 
a uniform charge-distribution which means that this mobil- 
ity is directly proportional to the size of the molecule. For 
these reasons, electrophoresis of DNA in polyacrylamide 
gels allows separation of molecules differing by as little as 
a single nucleotide. Advantage has been taken of this to 
develop sequencing strategies for the study of DNA. 


5.4.1 Sanger Dideoxynucleotide Sequencing of DNA 


In Sanger dideoxynucleotide sequencing, the sequence of 
a single DNA strand is determined. Where double stranded 
DNA is used in the reaction, the strand complementary to the 
strand to be sequenced must be displaced in some way before 
sequencing. Double stranded DNA is normally prepared by 
the polymerase chain reaction (PCR) which generates many 
copies of a double stranded template. In DNA sequencing, 
single stranded DNA functions as a template for replication 
of a series of daughter strands using DNA polymerase. 

In order to obtain large amounts of single stranded DNA 
the M13 bacteriophage system is used (Figure 5.12). This 
is a virus capable of infecting bacterial cells such as E. coli. 
Outside of such cells, the genome of the virus consists of ap- 
proximately 7 kb single stranded DNA. When M13 infects 
bacteria, this DNA is replicated into a double stranded form 
called the replicative form (RF). Once approximately 100 
copies of RF have accumulated, synthesis switches to pro- 
duction of single stranded DNA which is eventually pack- 
aged with protein and extruded from the cell. Approximately 
1000 such viruses may be released into the medium in a sin- 
gle cell generation. This life cycle means that M13 DNA 
may be isolated in either single or double stranded form and 
can be handled, respectively, as a virus or as a plasmid. Var- 
ious modifications have been made to wild-type M13 such 
as the inclusion of a polylinker site and of the Lac Z gene. 
These factors make M13 a particularly versatile system for 
replicating single stranded DNA fragments. 

DNA to be sequenced is cloned into the polylinker site of 
the RF form of M13 and used to transform lac” E. coli cells 
in the same manner as with bacterial plasmids. Recombi- 
nants are identified as white colonies and these are grown 
up in broth. Several copies of the recombinant RF form of 
M13 are made in the E. coli cell at first, but then synthesis 
switches to the single stranded form which eventually pre- 
dominates. Single stranded DNA collects at the periplasm 
of the cell and is combined with coat proteins to form M13 
bacteriophage. This is extruded from the cells without ly- 
sis and is collected by centrifugation. Single stranded DNA 
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is prepared from the virus by precipitation of coat proteins 
with polyethylene glycol followed by DNA extraction with 
phenol/chloroform and precipitation with ethanol. 

A short oligonucleotide called a primer, complementary 
to an M13 sequence near the origin of replication, is la- 
belled radioactively at the 5’ end which allows detection of 
each daughter strand and ensures that the sequence is read 
in one direction (5’ to 3’). A small amount of competitive in- 
hibitors of DNA polymerase, dideoxyadenine, dideoxygua- 
nine, dideoxythymidine or dideoxycytosine (ddNTPs) is 
included in each of four separate reactions, together with 
substrates for this enzyme; dATP, dGTP, dCTP and dTTP 
(dNTPs; Figure 5.13). The precise amount of dideoxynu- 
cleotide used may need to be optimized but it is usually 
1: 100 relative to the dNTPs. DNA polymerase is then added 
and catalyses synthesis of a new daughter-strand until a 
molecule of inhibitor is incorporated terminating the syn- 
thetic reaction. Since the ddNTP lacks a 3’-hydroxyl group, 
it cannot form a 3’,5’-phosphodiester bond and the DNA 
chain cannot be extended further. Thus, this technique is 
often referred to as the chain termination procedure. For 
example, for every G residue occurring in the sequence, 
at least some of the time, a ddG becomes incorporated in 
the strand. The experiment therefore results in four sets of 
DNA daughter-strands which are truncated fragments com- 
plementary to the DNA to be sequenced. 


5.4.2 Sequencing of DNA 


The four mixtures are then electrophoresed in an unusually 
long (up to 48 cm) and thin (0.4mm) continuous polyacry- 
lamide (4—6%) gels in the presence of urea (8 M) and at ap- 
prox. 30-40°C. These conditions ensure that DNA strands 
migrate without formation of significant secondary struc- 
ture. A shark’s tooth comb is used to create small reservoirs 
orwells at the top of the gel. These form between the teeth 
of the comb and the gel surface. Individual samples may 
be applied to specific wells thus avoiding mixing with other 
samples (Figure 5.14). In the four mixtures DNA synthe- 
sis will have terminated at different positions as shown in 
Figure 5.13. 

These samples are electrophoresed side-by-side with one 
track for ddG, ddA, ddT and ddC, respectively. After elec- 
trophoresis, the gel is dried onto a sheet of paper which 
is then laid against X-ray film which detects the radiolabel 
which was incorporated into the primer at the beginning of 
the experiment. The sequence may be read from the bottom 
of the autoradiogram, beginning with the smallest fragment. 
It is important to understand that this sequencing method 
depends absolutely on the ability to separate DNA to a res- 
olution of one base-pair and is only possible because of the 
absence of secondary structure combined with the uniform 
charge-distribution provided by the phosphate group of each 
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Figure 5.12. Preparation of single-stranded DNA for sequencing. DNA to be sequenced is cloned into the RF form of M13. This is 
transformed into competent lac- E. coli and replicated first into several copies of RF M13 and then into many copies of single stranded M13. 
Coat proteins surround this DNA to form the mature bactcriophage which is released into the medium. Single stranded DNA is collected by 
phenol-chloroform extraction and ethanol precipitation. This is used as template for DNA sequencing. 
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Figure 5.13. DNA sequencing by chain termination (Sanger dideoxynucleotide sequencing), (a) DNA polymerase catalyses the synthesis 
of a DNA strand complementary to the DNA to be sequenced. When a ddNTP is incorporated, synthesis of the daughter strand ceases. 
P represents a radiolabel on the oligonucleotide primer (which may be replaced by a fluorescent label, see text), (b) A separate experiment 
is set up for each of four ddNTPs. This generates a set of truncated fragments in each test-tube. These are then electrophoresed through a 


polyacrylamide-urea gel. 


nucleotide. Near the top of the autoradiogram it will be diffi- 
cult to resolve slight differences in electrophoretic mobility 
between different DNA fragments and this represents the 
limit of reliable sequencing in this experiment. From the 
rules of complementary base pairing, it is elementary to 
deduce the sequence of the parent strand from that of the 
complementary strand and, in the case of coding DNA, to 
translate this into a deduced amino acid sequence. Since 
even a single error in sequence may result in large errors 


in deduced amino acid sequence it is usual to carry out se- 
quencing experiments for both complementary strands of 
parent DNA. 

Using this approach, it is possible manually to sequence 
lengths of DNA up to 400 bp in a single experiment. By 
using parts of the sequence near the 3’ end to design a new 
primer, a second sequence overlapping at its 5’ end with 
the 3’ end of the first sequence may be generated (Figure 
5.15). This strategy, which is called gene walking or primer 
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Figure 5.14. Autoradiogram of DNA sequencing gel. Truncated fragments for each of four chain termination reactions are separated in 
polyacrylamide gels containing 8 M urea. Smaller fragments electrophorese furthest into the gel (for clarity, only those for ddG are shown 
in the illustration but bands in the C, A and T lanes arise from corresponding fragment sets for ddC, ddA and ddT). Bands are visualised by 
autoradiography on X-ray film. The sequence obtained is complementary to the sequence of the template DNA strand. 


walking, allows progressive sequencing of large amounts 
of DNA. 

Another means of extending the amount of DNA se- 
quence information is automated DNA sequencing. Dye 
primerDNA sequencing (Figure 5.16) uses four distinct fluo- 
rescent compounds to label the primer at the 5’ end and, since 
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oe ee A Sequence 2 
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these labels can be distinguished from each other spectro- 
scopically, the four sets of truncated fragments can be mixed 
together immediately prior to electrophoresis and separated 
in a single electrophoresis track. This dispenses with the 
need for four different tracks per sample required for manual 
sequencing and also avoids the use of radioactivity. Larger 
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Figure 5.15. Primer walking approach for sequencing 1 kb DNA fragment. New oligonucleotide primers are designed from 3’ ends of 
previously sequenced region. In this way sequencing of a template can be continued until completed. 
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Figure 5.16. Automated DNA sequencing. Instead of a radiolabel, primer strands are labeled with a different fluor or each ddNTP. These 
have distinct excitation (A;) and emission (A2) wavelengths and can therefore be easily distinguished by the detector. A single track can be 


used for separation of all four ddNTP nested fragments. 


amounts of sequence data are therefore obtainable from each 
electrophoresis run. Moreover, since the fluors are detected 
as they pass near the bottom of the gel, longer sequences 
(up to 600 bp) can be read. A modification of this method 
is dye terminator DNA sequencing which involves incorpo- 
ration of fluorescent dyes in the ddNTP molecule. In this 
method, the four reactions may be performed in a single 
test-tube and electrophoresed in a single lane of the poly- 
acrylamide gel thus giving even better sample-throughput 
than dye primer DNA sequencing. For some applications, it 
may be preferred to use a single fluorescent dye (in place of 
the radiolabel shown in Figure 5.13) and four electrophore- 
sis tracks. Automatic sequencers have inbuilt facilities for 
storing and analyzing sequence data and may be interfaced 
with robots in large-scale sequencing projects such as the 
human genome project. 


5.4.3 Footprinting of DNA 


In eucaryotes, most DNA sequences are noncoding, that 
is they do not code for protein. Examples include satel- 
lite DNA, intervening sequences within genes (introns) and 
regulatory sequences. The latter control gene expression in 
eucaryotes and, for this reason, is widely studied. Good 
examples of these are promotor, enhancer and operator 
sequences which are targets for highly sequence-specific 
DNA-binding proteins (frequently called factors). Binding 
of protein at these sites is responsible for a regulatory effect 
involving turning expression of a specific gene on or off. 


A widely-used experiment allowing the identification of 
regulatory DNA sequences is DNA footprinting. This ex- 
periment (Figure 5.17) involves mixing genomic DNA (or 
smaller fragments derived from genomic DNA) with an ex- 
tract containing the protein factor under investigation fol- 
lowed by digestion with a small amount of DNA-ase I. Re- 
gions of DNA to which protein is bound are protected against 
DNA-ase digestion. The DNA is then separated from pro- 
tein, denatured with urea and analysed in polyacrylamide 
gels. Comparison of samples digested in the presence and 
absence of DNA-binding protein allows identification of 
DNA fragments for which the protein is specific. These 
can then be electroeluted, cloned and sequenced. Studies 
with a range of DNA-binding proteins has allowed the iden- 
tification of consensus recognition sites for such proteins 
and these allow screening of new DNA sequences to find 
putative regulatory sequences. An adaptation of DNA foot- 
printing allows simultaneous identification and sequencing 
of the regulatory site (Figure 5.18). 


5.4.4 Single Strand Conformation Polymorphism 
Analysis of DNA 


If double stranded DNA is first denatured into single 
stranded molecules and then placed in nondenaturing con- 
ditions, DNA molecules can adopt specific conformations 
as a result of sequence-specific intramolecular hydrogen 
bonding. Single stranded molecules of similar mass may 
show slight differences in their electrophoretic mobility as a 
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Figure 5.17. DNA footprinting. Genomic DNA is incubated with DNA-binding, regulatory protein followed by treatment with DNase I. 
Sufficient DNase I is added to give approximately one cleavage (\) per DNA strand. Protein is then removed and the DNA is denatured and 
electrophoresed in polyacrylamide gels. Sample containing the DNA-binding protein (+) is missing bands present in the control (—) sample 
not exposed to DNA-binding protein. These are called footprint bands and represent DNA sequences which are protected against DNase I 


digestion by the DNA-binding protein. 


result of mutations or polymorphisms which cause small dif- 
ferences in conformation. Indeed, two molecules of DNA 
differing in only a single base can be distinguished elec- 
trophoretically by a band-shift in polyacrylamide gels. This 
forms the basis of a very useful and widely-used screening 
method for the identification of mutations in DNA called 
single strand conformation polymorphism (SSCP) analysis 
(Figure 5.19; Example 5.4). 

The procedure involves heating DNA fragments (200— 
400 bp) of a single gene from a range of samples (e.g. dif- 
ferent human individuals) in either NaOH or formamide 
to denature double-stranded DNA. This is then chilled and 
loaded onto a nondenaturing polyacrylamide gel. During 
electrophoresis, variants of a single DNA molecule will 
adopt slightly different conformations resulting in slightly 
different electrophoretic mobilities. This procedure may be 
performed with a number of samples simultaneously and 
is more convenient than direct sequencing for screening of 
samples for mutations (although the result is later confirmed 


by sequencing). This technique has come to be widely-used 
in the screening of human DNA for mutations responsible 
for genetically-based diseases. 


5.5 ISOELECTRIC FOCUSING (IEF) 


We have seen that proteins have a range of electrically- 
charged groups on their surface which contribute to pro- 
tein solubility and which underlie the different behaviour 
of individual proteins in experimental systems such as non- 
denaturing electrophoresis (Section 5.2) and ion exchange 
chromatography (Section 2.4). The net charge on a protein 
is heavily influenced by pH since the groups on the pro- 
tein surface are titratable. At acidic pH most proteins are 
positively charged while at alkaline pH they are negatively 
charged (Figure 2.9). However, in between these extremes, 
the net charge on individual proteins can vary consider- 
ably. This variation is a reflection of differences in amino 
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Figure 5.18. Simultaneous identification and sequencing of regu- 
latory sites in DNA. If identical samples are used for both DNA 
footprinting and sequencing (see Section 5.4.2), these can be elec- 
trophoresed on the same polyacrylamide gel. The regulatory se- 
quence determined for the example shown is ATC (complementary 
to TAG), although in reality, regulatory sequences are usually longer 
than this. 


(a) 
HGCTAGCH 
(b) 


123 


Band-shifts 


Mutation 


Polyacrylamide gel 
(non-denaturing) 


ELECTROPHORESIS 167 


acid sequence and/or post-translational modification. The 
pH value at which the protein has no net charge is the iso- 
electric point (pI). Under standard experimental conditions 
of temperature, buffer composition and in the absence of ex- 
tensive chemical or post-translational modification, pI may 
be regarded as a constant property of a protein (Table 5.3). 
Since this is an important biophysical parameter underlying 
the pH-dependent behaviour of proteins, we frequently wish 
to determine pI experimentally. In this section, we shall look 
at methods used in the determination and application of pI 
values in proteins. 


5.5.1 Ampholyte Structure 


The usual approach to determination of pI involves the 
formation of a stable pH gradient. It is technically dif- 
ficult to achieve such a gradient with buffer components 
described in Chapter 1 since they would simply diffuse to- 
gether in free solution. Ampholytes are synthetic, low Mr 
heteropolymers of oligoamino and oligocarboxylic acids 
(Figure 5.20). The various combinations of amino and car- 
boxylic acid groups possible allow synthesis of a wide range 
of polymers each possessing a slightly different pI. When a 
mixture of ampholytes is placed in an electric field, each 
migrates to its individual pI value thus forming a gra- 
dient of pI. Since ampholytes also have good buffering 
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Figure 5.19. Detection of mutations in DNA by single-strand conformation polymorphism (SSCP) analysis, (a) DNA is denatured to make 
it single-stranded. When placed in non-denaturing conditions, intrachain hydrogen bonding can occur. Mutations in single-stranded DNA 
may alter the intrachain hydrogen bonding pattern giving a conformational polymorphism, (b) Wild-type and mutant DNA have altered 
mobilities in fragments containing mutations. These are visualised as band-shifts in polyacrylamide non-denaturing gels. Bands can either 


be shifted up (sample 2) or down (sample 3). 
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Example 5.4 Detection of point mutations in 6-globin DNA using single-strand conformation polymorphism. 


When denatured DNA is electrophoresed under nondenaturing conditions, intrastrand hydrogen bonds can form which 
cause the molecule to adopt a number of distinct shapes or conformations. These are visualized in polyacrylamide gels as 
a series of bands. If the DNA varies from the wild type, some intrastrand hydrogen bonds are altered which, in turn, alters 
the pattern of conformations formed. This conformational change changes the pattern of bands visible in polyacrylamide 
gels. This experiment is called single-strand conformation polymorphism (SSCP) analysis. 

Mutations were randomly introduced into a 193 bp fragment of the promotor region of the -globin gene in a plasmid. 
Large variation was observed in the pattern of fragments visualized in a 5% polyacrylamide gel as shown below; 
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This experiment shows that all mutations were detectable by SSCP analysis. Interestingly, neither the precise position 
of a mutation nor its nature (i.e. whether it was a purine —> pyrimidine, pyrimidine — purine or deletion) determined how 
different the conformation was from the pattern of the wild type. Because of its exquisite ability to distinguish otherwise 
very similar DNA sequences, SSCP is now the major method for detection of mutations. However, once an SSCP is 
detected, it is necessary to sequence the DNA displaying the SSCP in order to identify the actual mutation. 

Reproduced from Glavac, D. and Dean, M. (1993) Optimization of the single-strand conformation polymorphism 
(SSCP) technique for detection of point mutations. Human Mutation, 2, 404—414, with permission of John Wiley & Sons 
Ltd. 


capacity, a gradient of pH is thus stabilized across the elec- An alternative to the use of free ampholytes is provided 
trical field. Commercially available ampholytes may be se- by immobilized pH gradients. This involves the use of 
lected to cover a wide (e.g. 3—9) or narrow (e.g. 6-7) pH derivatives of acrylamide which contain weak acid or base 


range. groups called immobilines (Figure 5.21). The acid groups are 
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Table 5.3. pI values of selected proteins at 10°C carboxylic acids with pK values of 3.6 or 4.6 while the basic 
—— rr groups are tertiary amino groups with pKs of 6.2, 7.0, 8.5 or 
Protein pl 9.3. At least one each of the acid and alkaline immobilines 
‘ are mixed together in the presence of acrylamide monomers 
a bee ae to form a polyacrylamide gel. By varying the identity and 
Amyloglucosidase 3.50 number of immobilines from the two categories available, 
Glucose oxidase 4.15 a variety of pH gradients may be generated. A particular 
Soyabean trypsin inhibitor 4.55 advantage of these gradients over those formed with free 
Ovalbumin 4.60 ampholytes is that they cover a very narrow pH-range al- 
Bovine serum albumin 4.90 lowing finer resolution between similar pI values. 
Urease 5.00 Immobilized pH gradients are formed by mixing two so- 
B-Lactoglobulin A 5.20 lutions, one containing acidic immobiline(s) plus glycerol, 
Carbonic anhydrase B 5.85 the other containing alkaline immobiline(s). The acidic im- 
Myoglobin (acidic band) 6.85 mobiline(s) will be concentrated near the end of the gradient 
ae band) o while the alkaline immobiline(s) will be concentrated near 
Cytochrome C 10.70 the top. In between, there will be a continuous gradient 


of decreasing carboxylic acid groups and increasing amino 


Lysozyme 11.00 ae 
—————— E groups. These groups act as local buffers resulting in a pH 
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Figure 5.20. Formation of stable pH gradient with ampholytes. (a) Ampholytes are short polymers containing both NH2 and -COOH 
groups. Two example structures are shown although many different individual structures will be present in a mixture (n = 2-3; R = H or 
(CH2)-COOH). (b) When mixed together in the absence of an electric field (left), a single net pH results. However, in the presence of an 
electric field (right), individual ampholytes migrate to their pI values. Here they act as buffers, resulting in a pH gradient. 
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Figure 5.21. Immobilised pH gradients, (a) The chemical structure 
of immobilines. These are derivatives of acrylamide and may be 
incorporated into polyacrylamide gels essentially as described in 
Figure 5.3. (b) Formation of immobilised pH gradient. Glyccrol 
is mixed with acidic ampholines which predominate at the bottom 
of the polyacrylamide gel. A gradient of decreasing acidic and 
increasing basic ampholines is formed. Once the gel is polymerised 
this gradient is immobilised. 


gradient which, because they are covalently linked to the 
polyacrylamide gel, is highly-stable. 


5.5.2 Isoelectric Focusing 


When a protein is placed in a pH gradient, it will migrate 
towards its pI (Figure 5.22). At pH values above pl, the pro- 
tein will possess a net negative charge which will make it 
migrate towards the anode. At pH values below pl, the pro- 
tein will have a net positive charge and will migrate to- 
wards the cathode. Only at the pI will the protein have 
zero net charge and, hence, no electric mobility (Section 
5.1). This experiment is called isoelectric focusing (IEF) 
because it involves focusing of molecules at their individual 
pl values. 


IEF may be performed in glass tubes or in a horizontal 
flat-bed format (Figure 5.23). Pre-cast IEF gels are com- 
mercially available widely. Alternatively, ampholytes may 
be mixed with components necessary for the formation of a 
polyacrylamide gel. Once the gel has polymerized, an elec- 
tric current is applied forming a pH gradient. Protein sample 
is then loaded (as an aqueous solution) and electrophoresis 
is resumed. When focusing is complete, the gel is fixed with 
trichloroacetic acid. In the case of flat-bed horizontal gels, 
pl is determined by comparison with standard proteins of 
known pI run on the same gel. In tube gels, the gel is cut 
into small pieces along its length and the pH of each piece is 
then measured. A graph of distance along the gel versus pH 
may then be plotted. The migration distance of the protein 
under investigation allows estimation of pI. 


5.5.3 Titration Curve Analysis 


Knowledge of pH-dependent effects on net charge of pro- 
teins is helpful in many experimental situations such as, 
for example, designing ion-exchange protocols (Chapter 2). 
Since the complement and pH-dependent behaviour of 
amino acid side-chains is highly individual, we find that 
a characteristic titration curve can be obtained for each 
protein. Such curves can be determined with the use of 
ampholytes in IEF gels (Figure 5.24). A pH gradient is first 
established in a low-percentage (e.g. 4-5 %) polyacrylamide 
gel. Then the gel is rotated 90° and protein sample is applied 
in a trough at the centre of the gel. Proteins migrate across 
the pH gradient in a manner dependent on their charge while 
ampholytes, since they are at their pI and uncharged, remain 
in position and maintain a stable pH gradient. Near pH 3, 
most proteins have a net positive charge while, near pH 9, 
most have a net negative charge. Since the gel is of low per- 
centage acrylamide, the main factor determining mobility at 
and between these extremes is net charge rather than protein 
mass or shape. This experiment allows the identification of 
specific pH values where two proteins may have different 
charge, thus facilitating rational design of an ion exchange 
protocol. 


5.5.4 Chromatofocusing 


If an ion exchange resin (Section 2.4.1) were equilibrated at 
pH 9.0 and then exposed to a second buffer at pH 7.0, a pH 
gradient would be formed between these two buffers. How- 
ever, this gradient would be narrow because of the limited 
buffering capacity of the two buffers across the whole range 
9.0-7.0 and it would be quite unstable. A chromatographic 
technique called chromatofocusing (Figure 5.25) based on 
the same principle as IEF uses the high buffering capacity 
of an ion-exchange resin coupled with amphoteric buffers 
to generate a stable pH gradient. Amphoteric buffers are 
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Figure 5.22. Isoelectric focusing (IEF). When placed in a pH gradient, proteins with positive net charge will migrate towards the cathode 
while those with a negative net charge migrate towards the anode (arrows). Proteins at their pI have no net charge and hence no electrophoretic 
mobility. 
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Figure 5.23. IEF experimental format, (a) IEF in glass tubes. The pH gradient is formed in glass tubes which are placed in an electrophoretic 
device. One sample is loaded per tube. After IEF, the gel is cut into thin slices and the pH of each slice is determined. From this a graph of 
mobility versus pH may be generated. From the mobility of protein band, pI of protein may be determined, (b) The flat-bed format allows 
TEF of standard proteins of known pI alongside sample protein. Comparison with standards allows determination of pI. 
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Figure 5.24. Titration curve analysis of proteins, (a) A stable pH 
gradient is established in a 5% polyacrylamide gel containing am- 
pholytes by pre-electrophorcsis. (b) Gel is rotated 90° and sample 
is loaded in central trough. Electrophoresis causes proteins 1 and 2 
to migrate differently due to their differing net charge across the pH 
range. Ampholytes do not move since they have no net charge at 
their pI. Proteins 1 and 2 give different titration curves and their pIs 
may be measured as the point where the curve and sample trough 
intersect. From this experiment we could conclude that, at pH 6, 
protein 1 has a net positive charge while protein 2 has a net negative 
charge. 


similar to ampholines in that they have good and even buffer- 
ing capacity across a broad range of pH. In this experiment 
it is possible to achieve selective elution of proteins very 
close to their pI values. 

An anion-exchange resin is first equilibrated at a partic- 
ular pH (e.g. pH 9.4), followed by amphoteric buffer. This 
buffer has good buffering capacity across a wide pH range 
(e.g. 9.0-6.0) and automatically forms a pH gradient through 
the column in the range 8.3—6.4. Sample proteins are now 
applied. At the top of the column (pH 6.4), the protein will 
be positively-charged and will pass freely through the resin 
without binding but encountering a gradually increasing pH. 
Eventually, the protein will reach its pI and, soon after this, 
acquire a negative charge which will allow it to bind to the 
anion exchange resin. As elution buffer continues to be ap- 
plied to the column, the pH at the binding site of the protein 
would be gradually lowered giving the protein again a neg- 
ative charge and causing it to dissociate from the resin. The 
protein would elute further down the column until a pH is 
reached which would again give it a positive charge causing 
it to bind to the resin once more. In this way, individual pro- 
teins move down the column being continuously desorbed 
and adsorbed as the pH gradient progresses through the resin 


bed. If a second batch of sample protein were added to the 
column sample proteins would behave similarly to those 
of the first batch and quickly ‘catch up’ at pH values just 
below protein pI. This focusing effect gives chromatofocus- 
ing a high capacity. Eventually, each protein in the mixture 
applied elutes from the resin at a pH value slightly above 
its pl. 


5.6 IMMUNOELECTROPHORESIS 


Antibodies are proteins which are highly evolved for specific 
recognition of immunological determinants called antigens. 
These antigens may be low molecular mass molecules such 
as sugars or else more complex structural components of 
biopolymers such as proteins. The specificity of this recog- 
nition makes antibodies, especially immunoglobulins of the 
G class (IgGs), particularly useful molecular probes for 
the structure of proteins, macromolecular complexes and 
viruses. The principle sources of antibodies are either ex- 
perimental animals such as rabbits (which raise polyclonal 
antibodies in their serum in response to injected antigen) 
or mouse hybridoma cells (which produce monoclonal anti- 
bodies in the culture medium in which cells are grown). We 
have seen how, once electrophoretically separated, we often 
wish to achieve specific recognition of individual proteins 
in complex mixtures (Section 5.2.3). Antibodies have found 
a number of uses in such specific detection. 

When immunodetection is combined with electrophoretic 
separation of proteins, this technique is called immunoelec- 
trophoresis and it will be described in Sections 5.6.2-5.6.5. 
It should be noted that immunoelectrophoresis techniques 
are generally performed in nonrestrictive gels with very 
large pores relative to the size of the antigen and antibody 
molecules. For proteins, these gels are composed usually 
of agarose but for low molecular mass molecules such as 
peptides, they may be polyacrylamide. The purpose of non- 
restrictive gels is to encourage separation based only on 
differences in charge between proteins rather than differ- 
ences in mass or shape. They also allow very fast separa- 
tions which may take less than one hour compared to the 
days required for immunodiffusion procedures. The tech- 
niques described in this section are based on recognition 
of antigens possessing native structure. Some other popular 
immunodetection methods (e.g. western blotting; Section 
5.10.2) involve recognition of denatured antigens. 


5.6.1 Dot Blotting and Immunodiffusion Tests 
with Antibodies 


Before carrying out immunoelectrophoresis, it is first neces- 
sary to confirm that antibodies are present which are specific 
for an antigen of interest and which have a sufficiently high 


ELECTROPHORESIS 173 


pH gradient 
formation 
9.4 6.4 
tenon i A pH gradient is 
uilbrated established along 
a H 9.4 Amphoteric the column 
ini buffer 
(Polybuffer 96) 
9.4 8.3 
Sample is 
loaded 
6.4 ; 
: ara ps Protein pl 7.6 
4 binds to resin 
Sample proteins a r 71 
(pl 7.0 and 7.6) 
77 
8.3 
Focusing and 
elution of 
sample 
6.4 
A280 pH 
64 71 8.0 
7 
6.4 
—— — — i 
t 64 
+ 6.4 Fraction 
8.3 8.2 7.4 


A A 


Figure 5.25. Chromatofocusing. An anion exchange resin (PBE 96; Pharmacia Biotech) is equilibrated at pH 9.4. An amphoteric buffer 
(Polybuffer 96) is applied, resulting in the formation of a stable pH gradient (8.3-6.4). A protein sample consisting of two components 
(pi 7.0 and 7.6) is applied. These bind to resin when negatively charged (i.e. at pH 7.1 and 7.7, respectively). As polybuffer 96 continues 
to be loaded, the pH gradient moves down the column, continuously desorbing and re-adsorbing the two proteins. They elute at pH values 


near their pls. 


affinity or avidity for this antigen to give sensitive detection. 
In this section two nonelectrophoretic experiments will be 
described in which antigens and antibody are brought to- 
gether to allow detection and quantification of antibody 
concentration. Measurement of presence and level of an- 
tibody is necessary since there is considerable variability 
in the immunogenicity of different antigens, some antigens 
giving a strong immunological response in individual exper- 
imental animals but not in others. Conversely, some antigens 


are highly immunogenic while other antigens give little de- 
tectable immunological response. 

When a protein and antibody encounter each other, a spe- 
cific binding event occurs between the variable region of 
the antibody and a region of the antigen called the epitope. 
In the case of polyclonal antibodies (i.e. antibodies raised to 
a range of epitopes) a number of binding sites may exist on 
the surface of a single antigen. Monoclonal antibodies rep- 
resent a population of identical antibodies which recognize 
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Figure 5.26. Binding of antibody to antigen. The variable region 
of the antibody binds specifically to a region of the antigen called 
the epitope. Different epitopes in protein structures may be recog- 
nised by specific antibodies. Polyclonal antibodies recognise many 
epitopes in a single antigen while monoclonal antibodies recognise 
only one. 


a single epitope. If an excess of either antibody or antigen is 
used, staining of the complex formed between them can be 
used to measure levels of antigen or antibody, respectively. 
In dot blotting, antigen is applied to a nitrocellulose mem- 
brane to which it binds (Figure 5.26). Additional binding 
sites are saturated with nonantigen protein such as bovine 
serum albumin (BSA). A solution of antibody may then 
be added allowing specific binding of the antibody to the 
antigen. The complex formed between these molecules may 
be quantified by staining with a second antibody which has 
two important properties. Firstly, it is specific for the species 
in which the first antibody was raised, usually recognizing 
the constant region of the IgG molecule (e.g. antigoat IgG 
or antirabbit IgG). Secondly, an enzyme (e.g. horseradish 
peroxidase) is covalently attached to it which facilitates 
quantification. For every molecule of antigen, one antibody 
molecule will bind. The second antibody, in turn, will specif- 
ically bind to this IgG. Since an enzyme is attached to this 
second antibody, many thousands of molecules of substrate 
will be converted by this enzyme into a product which is 
coloured. Considerable amplification of staining is there- 
fore achieved compared to that which might be expected, 
for example, with a protein dye which would only stain the 
original antibody-antigen complex. This factor makes dot 
blotting a highly-sensitive detection procedure for antibod- 
ies. It is usual to carry out the experiment with a range of 
dilutions of the antibody solution, thus giving an indica- 
tion of the antibody concentration or titre in serum taken 
from an experimental animal or in hybridoma cell culture 
medium. 

A second frequently-used, nonelectrophoretic procedure 
to demonstrate the presence and specificity of antibodies 
in various sources is Ouchterloney double-immunodiffusion 
assay in agarose gels. At low ratios of antigen to an- 
tibody, initially soluble antibody-antigen complexes are 
formed. As this ratio is increased, however, extremely large 


macromolecular assemblies of repeating antibody-antigen- 
antibody-antigen units are formed. This occurs at an opti- 
mum antigen/antibody ratio which is called the equivalence 
point. This complex is insoluble and will immunoprecipi- 
tate to form precipitin. In double immunodiffusion, a small 
amount (3-5 ul) of antiserum is placed in a central well 
and a range of dilutions of antigen is added to similar wells 
surrounding it (Figure 5.27). Both antigen and antibody so- 
lutions freely diffuse through the agarose (hence double 
immunodiffusion). If there is specific binding of antibody 
to the antigen, then precipitin forms. This is visible to the 
naked eye but may also be stained with coomassie blue af- 
ter washing away nonprecipitated protein. If two different 
antigens (e.g. two different proteins) recognize the same an- 
tibody, a common precipitin line is formed, that is the two 
lines fuse (Figure 5.28). These proteins are said to have 
immunological identity. If the proteins are nonidentical, the 
precipitin lines cross. In the case of partial identity between 
the antigens, a spur is formed. This type of experiment gives 
useful evidence for structural similarity between different 
antigens. 


5.6.2 Zone Electrophoresis/Immunodiffusion 
Immunoelectrophoresis 


In zone electrophoresis/simmunodiffusion immunoelec- 
trophoresis, it is first necessary to perform electrophoretic 
separation of antigens in an agarose gel containing a trough 
as shown in Figure 5.29. After electrophoresis, a solution of 
antibody is poured into this trough. As with Ouchterloney 
double immunodiffusion, antigens and antibodies diffuse 
towards each other through the agarose and form precipitin 
arcs. This technique is useful for qualitative detection of 
antigens in serum, cell extracts or other preparations. 


5.6.3 Rocket Immunoelectrophoresis 


At pH 8.0, most proteins possess a net negative charge and 
will electrophorese towards the anode. IgG molecules pos- 
sess little net charge at this pH, however, and consequently 
experience little electrophoretic mobility. However, because 
of movement of water towards the cathode called electroos- 
motic flow which will be described in detail later, (Sec- 
tion 5.10.1) IgGs migrate towards the cathode. Advantage 
of this is taken in rocket immunoelectrophoresis (Figure 
5.30). Antigen is placed in wells cut in an agarose gel con- 
taining antibodies. During electrophoresis, most antigens 
migrate towards the anode while IgG molecules migrate 
towards the cathode (electroosmotic flow). At first, solu- 
ble antibody-antigen complexes are formed at low antibody 
concentration and these continue to move towards the anode. 
However, as electrophoresis proceeds and all the antigens 
enter the gel, insoluble precipitin forms. When the gel is 
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Figure 5.27. Dot blot assay, (a) Antigen binds to nitrocellulose membrane and bovine serum albumin (BSA) is added to saturate non-specific 
binding sites, (b) Antibody is added to membrane forming an antibody—antigen complex, (c) A second antibody is added which binds to the 
constant region of the IgG molecule. This antibody is conjugated to the enzyme, horseradish peroxidase. (d) Schematic of arrangement on 


nitrocellulose. (e) A dot blot of increasing volumes of antiserum. 


Figure 5.28. Ouchterloney double immunodiffusion. (a) Antibody 
is placed in a central well (shaded) of an agarose gel. A range of 
dilutions of antigen is placed in wells 1-6 (e.g. 1 : 10000 to 1: 100 
in phosphate-buffered saline). Both antibody and antigen solutions 
diffuse outwards in all directions (arrows), (b) When equivalence is 
reached, prccipitin is formed which is visible as a white arc against 
a clear background. 


stained, this precipitin is found to form characteristic rocket 
shapes around each antigen well. The area under this shape 
is proportional to the antigen concentration in that well since 
greater amounts of antigen will migrate further towards the 
anode. Therefore, using standard antigen solutions of known 
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Figure 5.29. Patterns of reaction in Ouchterloney double immun- 
odifiiision assays, (a) Fused precipitin lines mean that the antigens 
in wells 1 and 2 have immunological identity, (b) Crossed precip- 
itin lines mean that the antigens in wells 1 and 2 are non-identical 
immunologically. (c) A spur between precipitin lines mean that the 
antigens in wells 1 and 2 are partially identical immunologically. 
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Figure 5.30. Zone electrophoresis/immunodiffusion immunoelectrophoresis. Protein antigens are first separated electrophoretically in an 
agarose gel. Antibody is added to the trough in the gel. Antibodies and antigen diffuse through the gel and, at equivalence, form prccipitin 
arcs. In this experiment, two antigens are identified in the protein mixture. 


concentration, a calibration curve may be generated and the 
antigen concentration of a range of samples conveniently 
determined. 


5.6.4 Counter Immunoelectrophoresis 


In counter immunoelectrophoresis, antigen is placed in a 
well at the cathode end of an agarose gel while antibody is 
placed in a well at the anode end (Figure 5.31). The anti- 
gen migrates towards the anode (electrophoretic flow) while 
the antibody migrates towards the cathode (electroosmotic 
flow). When they meet, a precipitin band is formed. 


g Agarose gel 
Antigen containing 
samples antibodies 


© Electrophoresis 
y 


5.6.5 Crossed Immunoelectrophoresis (CIE) 


A fourth form of immunoelectrophoresis is two-dimensional 
or crossed immunoelectrophoresis (CIE). This technique in- 
volves initial (i.e. first dimension) electrophoretic separation 
of antigens in an agarose gel. This is followed by cutting 
out a section of gel containing the separated proteins (Fig- 
ure 5.32). A second agarose gel containing antibodies (i.e. 
second dimension) is formed against this section and elec- 
trophoresis is performed at 90° relative to the axis of the first 
gel. Precipitin arcs form similar to those of rocket immu- 
noelectrophoresis and these allow quantification of several 
antigens in a single sample. 
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Figure 5.31. Rocket immunoelectrophoresis. An agarose gel is prepared containing antibodies. Decreasing concentrations of antigens are 
placed in wells 1—4. As antigens electrophorese towards the anode, precipitin is formed at the equivalence point. The area of the rocket 
shapes formed is proportional to the concentration of antigen in the well. 
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Figure 5.32. Counter-immunoelectrophoresis. Antigen moves to- 
wards the anode by electrophoretic flow. At pH 8.6, IgG is un- 
charged but is carried towards the cathode by electroosmotic flow. 
At the equivalence point, precipitin forms. 


5.7 AGAROSE GEL 
ELECTROPHORESIS OF 
NUCLEIC ACIDS 


5.7.1 Formation of an Agarose Gel 


Agarose gels are formed by first heating a suspension of 
agarose and then allowing it to cool with consequent for- 
mation of a gel. Heating disrupts the largely intramolecular 
hydrogen bonding pattern of agarose while cooling allows 
the reformation of hydrogen bonds (Figure 5.33). At least 
some of the time, these reformed bonds will be intermolec- 
ular in nature and it is these bonds which are responsible 
for gel formation and stabilization. Unlike other polysac- 
charide gels, agarose is largely uncharged and does not give 
secondary ionic interactions with nucleic acids or proteins. 
It is nonetheless a hydrated polymer due to extensive hydro- 
gen bonding to water. Pores formed in agarose gels are much 
larger than those in polyacrylamide. This has important im- 
plications for the mass and shape of molecules for which 
agarose gel electrophoresis is suitable (Section 5.8.3). 


5.7.2 Equipment for Agarose Gel Electrophoresis 


Although both vertical and horizontal gels are possible in 
principle, in practice agarose gels are usually formed in 
horizontal plastic moulds (Figure 5.34). This is sometimes 
called the submarine gel electrophoresis technique. Sample 
wells are formed in the gel by insertion of a plastic comb 
into the warm gel suspension before setting. This comb 
is removed once the gel has set producing a set of wells 
into which individual samples may be loaded. The gel is 
placed in a flat-bed electrophoresis tank for electrophoretic 
separation. This experiment allows side-by-side separation 
of a number of samples which facilitates direct comparison 
between them. For preparative electrophoresis, a single large 
well is formed by use of a continuous comb. 
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(a) Electrophoretic separation of antigens in agarose gel 
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Figure 5.33. Crossed immunoelectrophoresis (CIE). (a) Antigens 
are separated by electrophoresis in agarose gels, (b) A slice is taken 
from this gel and a second gel is poured against it which contains 
antibodies. Electrophoresis is then performed at 90° to the direction 
of gel in a). At the equivalence point, precipitin forms. Two of the 
proteins in the original mixture are antigens for the antibody shown 
and their relative amounts may be calculated from the area under 
the peaks. 


5.7.3 Agarose Gel Electrophoresis of DNA and RNA 


Electrophoresis of nucleic acids is dependent on molecu- 
lar mass in the range 1-25 kb. For DNA masses less than 
1 kb it would be necessary to use gels of high percentage 
agarose but gels of concentration greater than 1% exhibit 
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Figure 5.34. Formation of agarose gel. A suspension of agarose 
is heated to 70°C which disrupts intramolecular hydrogen bonds 
between —OH and O groups in agarobiose. On cooling, some in- 
termolecular hydrogen bonds form (dashed lines) which result in 
a hydrated gel network. The greater the percentage agarose, the 
smaller the average pore size of the gel. 
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Figure 5.35. Submarine electrophoresis in agarose gels, (a) Gel is 
formed in horizontal mould. Sample wells are formed in the gel 
during cooling by insertion of a comb, (b) Electrophoresis is per- 
formed under electrophoresis buffer (i.e. submarine) in a horizontal 
(i.e. flat-bed) format. 


high electroosmotic flow (Section 5.10.1) which places an 
effective lower limit on the mass of nucleic acid it is possible 
to separate. While newer commercially-available agaroses 
are almost capable of matching their impressive resolving 
power, in practice polyacrylamide gels are normally used to 
study nucleic acids of smaller mass (e.g. <200 bp). 

We have seen in Section 5.7 that agarose gels are largely 
nonrestrictive for proteins. However, compared to most pro- 
teins, nucleic acids are extremely long molecules with a 
large axial ratio (ratio of longitudinal to transverse axes of 
the molecule). Since, during migration through the gel the 
nucleic acid is free to rotate through all possible orienta- 
tions, it migrates as if it has an extremely large molecular 
mass (Figure 5.35). The same is true for proteins but, be- 
cause their axial ratios are much smaller than nucleic acids, 
the apparent mass is much closer to the actual mass. This 
means that agarose gels, while not restrictive for compact 
proteins, are restrictive for nucleic acids and should give 
separation based on differences in charge and mass. As nu- 
cleic acids have a fairly uniform charge-distribution, this 
means that they separate in practice according to differences 
in mass. Agarose gels are normally used in the concentra- 
tion range 0.8-1.5% allowing separation of molecules up to 
50 kb (although 20 kb is the upper limit for high-resolution 
separations). 

Above 20kb, separation of DNA becomes increasingly 
independent of molecular mass. This has been explained by 
two principal models (Figure 5.36). In the biased-reptation 
model, a DNA molecule larger than the average pore-size 
of an agarose gel is proposed to display the motion of a 
crawling snake (i.e. reptation) adopting elongated shapes 
orientated in the direction of the electric field. Large DNA 
molecules will therefore ‘slither’ through the agarose pores 
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Figure 5.36. Effect of axial ratios on electrophoresis of biopoly- 
mers. (a) Rod-like molecules such as nucleic acids and fibrous 
proteins have a large axial ratio (y/x). The example shown has a 
ratio of approximately 28 while the globular molecule of similar 
mass has a ratio of 1 (for simplicity, these representations take no 
account of the third dimension of molecular size, z). (b) Rod-like 
molecules migrate as if they have a mass many times greater than 
their actual mass. The rodlike molecule shown migrates with an 
apparent mass equivalent to that of the shaded molecule. 


in a mass-independent manner. An alternative model is 
supported by later work with both fluorescently-labelled 
DNA and computer simulations which show that large DNA 
molecules move through agarose gels undergoing cycles of 
elongation and contraction along the axis of the electric 
field. The leading part of the DNA molecule becomes con- 
centrated into a compact head which moves through the gel 
at a constant rate independent of molecular mass. During 
transition between contracted and extended structures, the 
DNA often adopts an overall U-shape around agarose fibres 
with the tail of the molecule gradually catching up on the 
head before taking over as the leading part of the molecule. 
In this model, the DNA behaves somewhat like a caterpillar 
with the leading and trailing parts of the U-shaped molecule 
alternately being sieved through the agarose pores in a 
manner independent of molecular mass. This latter model 
depends on DNA behaving as an elastic molecule during 
electrophoresis. 


Table 5.4. Some examples of separations performed using 
capillary electrophorcsis 


Sample component CE mode Detection 


Metals Free solution UV 
Catecholamines Free solution Electrochemical 
Propranolol (chiral Free solution UV 
separation) 
Amino acids Free solution UV 
Peptides Free solution UV 
Insulins SDS PAGE UV 
Serum proteins Discontinuous free UV 
solution 
DNA Free solution Fluorescence 
DNA (mutation Pulsed field in UV 
detection) ultradilute 
sieving solutions 
DNA (restriction Agarose Fluorescence 
fragments) 
Oligonucleotides Free solution UV 
Virus particles Free solution UV 
Bacterial Free solution UV 


5.7.4 Detection of DNA and RNA in Gels 


Once separated, nucleic acids may be detected with the 
aid of fluorescent dyes such as ethidium bromide (Table 
3.4). This intercalates into double-stranded DNA to give 
strong fluorescence. Although weaker fluorescence is found 
in single-stranded molecules such as RNA, it is still possi- 
ble to visualize bands under a UV lamp. Usually, this dye 
is incorporated into the electrophoresis buffer which means 
that staining and electrophoresis occur simultaneously. Elec- 
trophoresis may be halted while the gel is inspected under 
UV light and, if bands are not adequately separated, the gel 
may be replaced in the electrophoresis system and separa- 
tion continued. Because dyes such as ethidium bromide are 
also mutagens, great care must be taken in handling them. 
Newer, nonmutagenic dyes which equal or exceed the sen- 
sitivity of ethidium bromide are now widely available. 


5.8 PULSED FIELD GEL 
ELECTROPHORESIS 


DNA in vivo is organized in extremely large single 
molecules with high axial ratios (see previous section) which 
are arranged in chromosomes. It is therefore necessary to 
process DNA by restriction digestion to obtain sufficiently 
small molecules for separation and study in agarose or 
polyacrylamide gels. Sometimes though, we wish to study 
intact chromosomes or large DNA fragments of chromo- 
somes which exceed the resolution-range of these systems. 
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An electrophoretic technique called pulsed field gel elec- 
trophoresis has been specifically developed for separations 
in the mass range 200-3000 kb (Example 5.5). 


5.8.1 Physical Basis of Pulsed Field 
Gel Electrophoresis 


All the electrophoretic techniques we have so far encoun- 
tered in this chapter use a constant and unidirectional elec- 
trical field. However, in pulsed field gel electrophoresis, the 
direction of the field is varied continuously during elec- 
trophoresis (Figure 5.37). This is achieved by alternately 
turning on and off the current or pulsing the electrical field 
in short time intervals called the pulse time, ty. Intact chro- 
mosomal DNA or large fragments of chromosomal DNA are 
obtained by first lysing cells in an agarose matrix by treat- 
ment with a detergent or with enzymes such as lysozyme. 
This treatment often generates fragments of chromosomal 
DNA due to mechanical shearing. Large fragments may also 
be generated, if required, by digestion with unusual restric- 
tion enzymes recognizing rarely-occurring 8-nucleotide re- 
striction sites (e.g. Asc I, Not I). Slices of agarose containing 
large DNA molecules are then loaded on 0.6-1.5% agarose 
gels and electrophoresis is initiated. An array of electrodes 
is arranged around the gel such that the direction of elec- 
trophoresis may be continuously varied (see next section). 
Pulses of electricity are passed through these electrodes for 
times in the range 0.1 s to 90 min before changing the di- 
rection of the field and passing pulses of electricity through 
the field in the new direction (Figure 5.37). The direction 
of electrophoresis of DNA is constantly alternated between 
the two electric field directions as the molecules reorientate 
themselves through a reorientation angle. 

The precise reason why large DNA fragments separate 
so well on pulsed field gels is not yet fully understood. It 
is thought that, during reorientation, the helical structure is 
stretched and then compressed. The time required for this 
to occur is the visceoelastic relaxation time, t,, and it is 
dependent on molecular weight. Small molecules reorient 
themselves to the changed electric field much more quickly 
than larger molecules which means that the latter have less 
time available to electrophorese during a given electrical 
pulse. Accordingly the larger the molecule, the slower it 
will migrate through the gel. The ratio of t, to tp is crucial 
for good separation in pulsed field electrophoresis. If tp is 
much shorter than t, then there is insufficient time for the 
molecule to reorientate itself in response to the pulse. If 
tp is much Jarger than t, then the molecules will behave 
as in conventional agarose gel electrophoresis. Under both 
of these conditions large DNA molecules will not separate 
according to their mass. When the ¢,/tfp ratio is between 0.1 
and 1, optimal separation on the basis of mass is achieved. 
Since t, is much longer for larger molecules, this means that 
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Example 5.5 Determination of chromosome size in bacteria by pulsed field gel electrophoresis. 


Actinobacillus pleuropneumoniae causes a contagious pulmonary disease in pigs which is of major worldwide agricultural 
importance. A number of varieties or serotypes of this bacterium have been identified and these differ widely from each 
other in their pathogenicity and geographical distribution. Agar blocks were made with cells from a range of serotypes and 
these were digested with lysozyme and proteinase K. Pre-electrophoresis was then performed to remove any degraded 
or extrachromosomal (e.g. plasmid) DNA. The blocks were then digested with restriction enzymes such as Asc I (5’- 
GGCGCGCC-3’). These were subjected to contour-clamped homogeneous electric field gel electrophoresis (Figure 5.41) 
with pulse times of 5-20 s for 24h at 200 V. The pattern of medium sized (20—400 kb) restriction fragments obtained for 
13 serotypes (S1 to S12) is shown below (M is a standard made from polymers of bacteriophage A genome); 


S1 S9S11 M S2 S3 S4 S5a S5b S6 S7 M S8 S10 S12 


Similarity coefficient [%] 


Analyses of this type showed a fragmentation of the A. pleuropneumoniae genome into 6-12 fragments of molecular 
mass 11-1217 kb. Summing the sizes of all fragments for each serotype gave a total genome size of 22 283 to 2409 kb for 
this bacterium. Moreover, since significant polymorphism was evident in the digestion patterns of 12 of the 13 serotypes 
(see above). It was found possible to compare the degree of genetic relatedness between the serotypes using pulsed field 
gel electrophoresis by analysing digestion patterns obtained with Apa I. This is summarized in the above dendogram 
comparing percentage similarity of digestion patterns with Apa I for each serotype. This analysis shows, for example, 
that serotypes 1, 9 and 11 are closely-related to each other while distantly related to serotype 4; 

This type of study gives useful information on genome size and the degree of relatedness between different strains 
of pathogenic bacteria. It is particularly appropriate in studying the geographical spread of such organisms and for 
monitoring the appearance and likely origins of new serotypes. 

Images reproduced from Chevallier et al., (1998) Chromosome sizes and phylogenetic relationships between serotypes 
of Actinobacillus pleuropneumoniae. FEMS Microbiol. Letts. 160, 209-16, by permission of the Federation of European 
Microbiological Societies. 
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Figure 5.37. Some models for migration of large DNA molecules through agarose gels, (a) Biased-reptation model. DNA transverse to 
direction of electrophoresis cannot enter pores in gel. DNA orientated along axis of electric field can enter pores, (b) ‘Caterpillar’ model. 
DNA forms a compact mass which becomes retarded on an agarose fibre. It extends into a U-shape around the fibre. The head explores new 
routes through the gel. The tail is pulled along behind the head until compact mass is again formed. The tail can act as head in the next cycle 
of extension and contraction. Migration of head region through gel is independent of mass. 


pulse times of as little as 0.1 s are adequate for separation of 
molecules of 10 kb while pulse times of 1000 s are required 
in the mass range approaching 10 Mb. 

Any factors affecting t, will alter the resolution of pulsed 
field gel electrophoresis. Experimentally these have been 
found to include; DNA molecular mass (see above); elec- 
trical field strength, E (reorientation is faster in stronger 
electrical fields); gel concentration (affects pore size); tem- 
perature (affects buffer viscosity); reorientation angle (larger 
angles give longer t,). 


5.8.2 Equipment Used for Pulsed Field Gel 
Electrophoresis 


In a homogeneous electrical field a DNA molecule experi- 
ences a uniform field strength throughout electrophoresis. 
An inhomogeneous field arises due to variations in the elec- 
trical field strength. Both types of field arise during pulsed 
field electrophoresis. There is an upper limit to the electrical 
field strength which may be used in the experiment since 
too high values of E result in trapping of DNA in the gel. 
This limitation results in longer t, values which can push the 
time theoretically required for individual experiments from 
one day (1 Mb, tp 61 s, 1000 pulses) to 14 months (20 Mb, tp 
38 000 s, 1000 pulses). In practice, electrophoresis is usually 
performed for 12—24h. 


Specialized apparatus featuring an array of electrodes 
is required for pulsed field gel electrophoresis. The 
shape or geometry of this array differs between particu- 
lar types of pulsed field electrophoresis experiments (Fig- 
ures 5.39-5.42). A unique electric field orientated in a par- 
ticular direction is achieved by varying which electrodes 
carry current, for how long and in what direction. Flow of 
current to the electrodes and pulse-times are electronically 
controlled allowing selection of optimized separation con- 
ditions. Agarose gels are poured and electrophoresed in a 
flat-bed, horizontal format as shown in Figure 5.37 and the 
gel may be arranged in a variety of orientations relative to 
the array of electrodes. 

In orthogonal field alternating gel electrophoresis (Fig- 
ure 5.38), pairs of electrodes are arranged at 90° relative 
to each other, resulting in creation of an inhomogeneous 
electrical field. This inhomogeneity has the disadvantage 
that lanes are distorted outwards at the bottom of the gel. 
In contour-clamped homogeneous electric field gel elec- 
trophoresis (Figure 5.39), point electrodes are arranged 
hexagonally around the gel. This results in a homoge- 
neous electrical field in all lanes of the gel. Crossed-field or 
rotating-gel electrophoresis (Figure 5.40) takes advantage of 
a constant electrical field with the gel is held on a rotating 
platform. This is limited to separation of DNA >50 kb since 
smaller molecules require t, less than one second which is 
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Figure 5.38. Pulsed field gel electrophoresis. DNA fragments are exposed to an electric field which alternates in different directions as a 
result of electrical pulses. The DNA moves through the gel in response to this field. Small molecules move more quickly than large ones due 
to their shorter visceoelastic relaxation times. For simplicity, an array of only four electrodes carrying equal charge is shown. In reality, the 
array may consist of many electrodes arranged in various geometries (Figure 5.40-5.43) and carrying different proportions of the current 
thus generating a complex electric field. Separation is optimised by varying the pulse-time, homogeneity and geometry of the field. 


the minimum time required to rotate the gel. These three 
methods give a simple linear dependence of mobility on 
molecular mass. Field-inversion gel electrophoresis (Figure 
5.41) uses a periodic inversion of the electric field in one 
direction (i.e. a reorientation angle of 180°). The forward 
pulse time is longer than that in the reverse direction thus 
resulting in net forward motion of DNA. Unlike the other 
pulsed field methods described, the dependence on DNA 
molecular mass in this technique is near-parabolic. Particu- 
larly large or small molecules progress rapidly through the 
gel with little resolution while intermediate-sized molecules 
move more slowly and are well-resolved. 

For a given sample it is necessary to vary the pulse-time, 
field-strength, homogeneity and geometry of the electric 
field created across the gel to optimize separation. 


5.8.3 Applications of Pulsed Field 
Gel Electrophoresis 


Because this technique facilitates analysis of DNA frag- 
ments on the scale of individual chromosomes, it has found 
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Figure 5.39. Orthoganol field-alternating gel electrophoresis. The 
electric field is alternated between two directions (arrows) by puls- 
ing between two pairs of electrodes, the reorientation angle is that 
formed between directions 1 and 2. Note the outward distortion of 
bands near the end of gel. 


extensive uses in large-scale mapping of chromosomes. This 
has particular importance in bacterial taxonomy allowing the 
identification of relationships between existing and novel 
strains of bacteria. In eukaryotes, pulsed field gel elec- 
trophoresis of yeast chromosomes has revealed widespread 
length polymorphisms in both sexual and asexual species. 
The technique has also been applied to studies on strand- 
breaks in human chromosomes as a result of exposure to 
toxic chemicals. 
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Figure 5.40. Contour-clamped homogeneous electric field elec- 
trophoresis. Point-electrodes (labelled 1-24) are arranged in a 
hexagonal array around the gel as shown. Electrodes 1—4 are held 
at a negative potential while electrodes 13—16 are held at the same, 
though positive, potential. Intermediate potentials are applied to all 
other electrodes in the array. To change the direction of the electric 
field by 105°, the negative potential is then applied to electrodes 
9-12 while positive potential is applied to electrodes 21-24 and 
other electrodes again adopt intermediate potentials. This system 
achieves a homogeneous electric field throughout the gel. 
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Figure 5.41. Rotating-gel electrophoresis. Gel is rotated back- 
wards and forwards through a reorientation angle (e.g. 120°). Since 
the minimum rotation time is approximately 1 s, this technique can- 
not be used for separations requiring pulse times less than 1 s (i.e. 
DNA of mass <50 kb). 


5.9 CAPILLARY ELECTROPHORESIS 


When describing the basic theory of electrophoresis (Section 
5.1.1) it was pointed out that use of high voltages leads to 
considerable heat generation. As well as affecting the struc- 
tural integrity of biopolymers, heat is likely to affect the 
resolution attainable with electrophoresis since convection 
due to heat will decrease resolution of electrophoretically- 
separated bands. This places an effective upper limit on the 
voltage which may be used in most electrophoresis experi- 
ments at 300-500 V. If electrophoresis is performed in very 
thin capillaries, with small internal volumes and large sur- 
face areas, however, it is to be expected that it will be possi- 
ble to use higher voltages than normal since heat generated 
would be efficiently dissipated. This is the basis of capil- 
lary electrophoresis (CE) which is now a major analytical 
technique for the study of biomolecules, macromolecular 
assemblies and even intact cells (Example 5.6). 


5.9.1 Physical Basis of Capillary Electrophoresis 


Use of CE in mapping of peptides is described in Figure 
5.42. The glass capillary used in CE has an inner-diameter 
of 10-100 um, an external diameter of 300 um and a typi- 
cal length of 10-100 cm. This gives an included volume for 
separation which is approximately a thousand times smaller 
than that of a slab gel or a HPLC column. The capillary 


ELECTROPHORESIS 183 


is usually filled with a buffer (free solution CE) although it 
may also be filled with a gel such as agarose, polyacrylamide 
or a noncrosslinked linear polymer (gel electrophoresis CE 
or capillary gel electrophoresis). All of the electrophoresis 
applications so far described in this chapter such as isoelec- 
tric focusing, nondenaturing electrophoresis, SDS PAGE 
and pulsed field gel electrophoresis are possible in capil- 
lary systems. The main difference between electrophore- 
sis in capillaries and that in rod or slab gels is the very 
high voltages (5-50 kV) possible in capillary systems. If 
such voltages were used in slab or rod gels, the heat gener- 
ated would make successful electrophoresis impossible. Use 
of capillaries facilitates highly-efficient heat dissipation. A 
further difference between capillary and slab/rod gel elec- 
trophoresis is the small sample volumes (nL) and loadings 
(Attomoles) necessary for capillary electrophoresis (see Ex- 
ample 5.7). This is a consequence of the small volume of the 
capillary used and means that CE is principally an analytical 
technique suitable for highly-sensitive detection of as little 
as 107!! M sample components. 

The resolution attainable is also extremely high and CE 
has found particular uses in the high-resolution separation 
of chiral mixtures. The number of theoretical plates in a 
capillary used in CE is given by Equation (5.9); 


uV 
N =—— 5.9 
D (5.9) 


where u is electrophoretic mobility, V is the applied voltage 
and D is the diffusion coefficient of the molecule in the 
electrophoresis medium. High voltages in CE give a high 
number of theoretical plates and hence, higher efficiency of 
separation. 

The time, t, required for a molecule to pass through a 
capillary of length L is given by Equation (5.10); 


L? 


t= 
u: V 


(5.10) 


Equations (5.9) and (5.10) mean that column length 
makes no contribution to the efficiency of separation but 
does influence separation time while higher voltage gives 
better separation efficiency. Since u and D are properties 
of the sample component, they are not amenable to exten- 
sive experimental variation. Therefore the most efficient CE 
system would be expected to include the shortest capillary 
possible (to shorten separation time) and the highest voltage 
possible (to maximize separation efficiency). In practice, 
the ability of the capillary efficiently to dissipate heat limits 
both of these variables leading to use of maximum voltages 
of 50 kV and minimum capillary lengths of 10 cm. 

Since a major component of the glass used in capillaries 
is negatively-charged silica, a layer of counter-ions such as 
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Example 5.6 Detection of prion protein by capillary electrophoresis (CE) 


Transmissible spongiform encephalopathies cause progressive degeneration of the central nervous tissue in mammals. 
While scrapie has been known in sheep for over two centuries, bovine spongiform encephalopathy (BSE or ‘mad cow 
disease’) and associated human conditions such as Creuzfeld—Jakob disease and Kuru have only relatively recently been 
described and understood to form a family of related diseases. The mode of transmission of these illnesses is thought to 
be unique in that they may not be primarily caused by infectious agents such as bacteria and viruses. Protein tangles have 
been observed to accumulate in the brain tissue of infected animals which are protease-resistant aggregates of a normal 
protein (the prion protein) from which the N-terminal 30-50 residues have been proteolytically removed. The normal 
protein is approx. 40% a-helix while the abnormal form is 45% B-pleated sheet with considerably less a-helix and it is 
thought that these two forms can interconvert. This conformational interconversion is stimulated by the presence of small 
amounts of abnormal protein and even by short peptides derived from this abnormal protein. It is hypothesized that small 
amounts of abnormal protein in the diet can avoid proteolysis in the gut, cross the blood-brain barrier and infect brain 
cells promoting conformational change in the normal protein and resulting in the formation of tangles. 

Prions may be identified by western blot analysis but this method of detection is probably not sensitive enough to detect 
the early stages of infection in sheep or cattle. Moreover, large amounts of brain tissue are required for such analyses 
and the results are not completely quantitative. SDS PAGE CE in a 57cm capillary at a voltage of 15 kV was therefore 
assessed as a method for detection of the abnormal prion protein in samples from infected sheep brain. Separations 
obtained for A) Normal and B) scrapie-infected individuals are shown below; 


(a) 
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A peak eluting from the capillary at 26 min absent in all controls, was identified in all infected sheep. This separation 
could be performed in <40 min. and is sensitive and robust enough to detect the prion protein with a minimum of 
sample. While some 20 mg of brain tissue is required for a single western blot as little as 50 ug is necessary for each CE 
experiment. 
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Reproduced from Schmerr, M.J., Jenny, A. and Cutlip, R.C. (1997) Use of capillary sodium dodecyl sulfate gel 
electrophoresis to detect the prion protein extracted from scrapie-infected sheep. Journal of Chromatography B, 697, 
223-9, with permission of Elsevier. 
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Figure 5.42. Field-inversion gel electrophoresis. Electrodes alter- 
nate between being cathode or anode. The pulse time or field 
strength in the forward direction is represented by the downward- 
facing arrow while that for the backward direction is represented by 
the upward-facing arrow. The ratio between the forward and reverse 
pulses depends strongly on the molecular mass of sample DNA and 
must be chosen to yield sizedependent resolution. Very small and 
very large DNA migrate quickest through the gel. 


Nat and H* assemble along the interior surface of the capil- 
lary (Figure 5.43). This layer of ions, in turn, attracts a hydra- 
tion layer of water. During electrophoresis, the positively- 
charged Na*/H* ions are electrophoretically attracted to- 
wards the cathode and carry the hydration layer of water with 
them. This phenomenon is called electroosmotic flow. This 
is acommon occurrence during electrophoresis and we have 
previously encountered this term in a number of applications 
of electrophoresis described in this chapter. Electroosmotic 
flow has a particular significance in CE, however, since the 
internal volume of the capillary is so small. The thin hy- 
dration layer (together with other water molecules to which 
this layer is hydrogen bonded) represents a major fraction 
of the water contained within the capillary. Electroosmotic 
flow is therefore quantitatively particularly important in CE. 
Electroosmotic flow is strongly pH-dependent being up to 
ten times faster at low than at high pH. 

In addition to electroosmotic flow, sample components 
also experience electrophoretic flow (i.e. positively-charged 
ions are attracted to the cathode, negatively-charged ions 
to the anode). The precise mobility of an individual ion 
will therefore be the result of a combination of these two 
flows (Figure 5.44). Positively-charged ions will tend to flow 
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Figure 5.43. Free solution capillary electrophoresis of peptides in SDS. A typical CE analysis of peptides generated by digestion with a 
specific protease. In this example, the peptides arise from an insoluble antigen (HC-31) of the hepatitis C virus which was digested with 


Staphylococcus aureus V8 protease (Winklcr et al., 1996). 


toward the cathode as a result of both electroosmotic flow 
and electrophoretic flow. Uncharged sample components 
will move towards the cathode in response to electroosmotic 
flow only (since they will experience no electrophoretic 
flow). Negatively-charged ions will move either towards the 
cathode or the anode depending on the relative strength of 
the two flows. 

This gives rise to some apparent paradoxes since un- 
charged molecules move in a CE system (in most elec- 
trophoresis systems they experience no movement) and 
negatively-charged ions can move towards the cathode in 
such systems (which seems to contradict electrostatic attrac- 
tion). By varying pH and hence the strength of electroos- 
motic flow, the separation of individual components can be 
optimized in CE. If it is desired to carry out separations on 
the basis of differences in electrophoretic flow alone, it is 
possible to coat the inner surface of the capillary to remove 
ionic interactions. Such interactions may be undesirable, for 
example, in separation of proteins. 

A major limitation of CE is the limited volume of sample 
which may be applied. In many cases, the components of 
the sample may be quite dilute and difficult to detect de- 
spite the high-sensitivity detectors used. Isotachophoresis 
is an electrophoretic technique involving separation in a 


discontinuous buffer system which has found wide appli- 
cation in CE. In capillary isotachophoresis a sample ion 
is electrophoresed under constant current behind a leading 
ion of higher mobility and in front of a terminating ion of 
lower mobility in a capillary (Figure 5.46). The relationship 
between the mobilities of these ions is given by; 
UL > Ha > Ug > HT (5.11) 
where A and B represent sample ions. When an electri- 
cal field is applied at constant current, the field strength is 
greater in the vicinity of lower-mobility ions and lower in 
the region of higher-mobility ions and the product of field 
strength and mobility is constant for each ion so that all ions 
migrate at the same velocity. 
ML: Er = ua: Ea = Wp: Eg = ur: Er (5.12) 
If this were not the case then a ‘gap’ could develop be- 
tween the ions which would stop the current being carried al- 
together. The relationship described in Equation 5.12 results 
in a zone sharpening effect. Ions which diffuse into a zone 
with higher mobility will be slowed because of the weaker 
electrical field. Conversely, ions diffusing into a zone of 
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Figure 5.44. Electroosmotic flow in capillary electrophoresis. The silica of which glass capillary is composed is negatively charged. This 
attracts a layer of positively charged counterions (Na+/H+) which, in turn, is coated with a hydration layer of water. During electrophoresis, 
the counterion layer migrates towards the cathode, bringing the hydration layer with it. Since this layer is, in turn, hydrogen bonded to bulk 
water and since the capillary is very narrow, this means that there is a net flow of water towards the cathode. 
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Figure 5.45. Interaction between elcctrophoretic and electroosmotic flow in capillary clcctrophorcsis. Cations move towards the cathode 
in response to both electroosmotic and electrophoretic flow. Uncharged molecules move towards the cathode in response to electroosmotic 
flow only. Anions move dependent on the balance between electroosmolic and electrophoretic flows. In the example shown, electrophoretic 
flow is sufficiently strong to overcome electroosmotic flow giving net mobility towards the anode. However, if electroosmotic flow were 
strong enough, electrophoretic flow could be overcome and net mobility to the cathode could result. Note that electrophoretic mobility is 
stronger in multiply charged molecules than in those which are singly charged. 
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Figure 5.46. Capillary isotachophorcsis. Sample components A 
and B are electrophoresed in front of a terminating ion (T) and be- 
hind a leading ion (L). Ions A and B concentrate into distinct ‘stacks’ 
due to a combination of local differences in the field strength and 
their differing electrophoretic mobilities. Ion A has a lower ele- 
clrophoretic mobility than B which, in turn, has a mobility less than 
L. The stacks move with a constant velocity through the capillary. 


lower mobility will be accelerated due to the stronger elec- 
tric field. In isotachophoresis, sample ions therefore form 
stacks during electrophoresis. The concentration of each 
sample component in these stacks is a fixed function of that 
of the leading ion. For sample component A, the concentra- 
tion of A, ca, is given by; 
ca = fcr) (5.13) 
By varying the concentration of the leading ion, we can 
vary the concentration of each sample component, regard- 


less of their actual concentrations in the original sample 


(Figure 5.47). 
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Figure 5.47. Concentration effect during capillary isotachophore- 
sis. The concentration of sample component A is related directly to 
that of the leading ion L. During isotachophoresis, the length of the 
zone occupied by A alters to achieve this concentration. In this way 
isolachophoresis can be used to concentrate dilute samples prior 
to CE. 


Isotachophoresis is performed in teflon or quartz capillar- 
ies at voltages up to 30kV and may be used to concentrate 
samples for application to CE. The limited loading volumes 
possible with CE are therefore overcome. 


5.9.2 Equipment Used in Capillary Electrophoresis 


The experimental apparatus used in CE is described in Fig- 
ure 5.48. A 5-50kV DC source provides the electrical po- 
tential difference between the electrode buffer tanks or elec- 
trolyte chambers. These are connected by a narrow capillary 
of variable length (~50-100 cm). Detection is possible by a 
number of different methods such as conductance and elec- 
trochemical detection. MS (Chapter 4) can also be interfaced 
with CE and used as a detector allowing a two-dimensional 
characterization of sample components. For proteins, pep- 
tides and nucleic acids laser-induced fluorescence in the 
UV-visible part of the spectrum is especially popular for 
high-resolution work. Because the voltages used in this 
equipment are potentially lethal, care is necessary when 
carrying out CE experiments. Sample throughput can be 
greatly improved by the use of an autosampler, data acquisi- 
tion/storage systems and the recent development of capillary 
array electrophoresis which allows simultaneous analysis in 
an array of up to 96 capillaries. 

Since heat dissipation is critical for the success of CE 
it is essential that heat gradients between the interior wall 
of the capillary and the outside be as low as possible. This 
depends on the rate of heat transfer between the outermost 
layer of the capillary and the surrounding surface. Where the 
surrounding surface is simply air, the temperature gradient 
between the innermost and outermost layer of the capil- 
lary can be as large as 125 °C/mm. Forced air cooling can 
reduce this to 51°C/mm while arranging contact with a 
temperature-controlled aluminium plate with high thermal 
conductivity reduces it further to 7°C/mm. Effective ther- 
mostatting is therefore acommon feature of CE instruments. 


5.9.3 Variety of Formats in Capillary Electrophoresis 


CE offers a number of distinct advantages for the anal- 
ysis of biological samples. The technique facilitates 
rapid separations of complex mixtures in minutes with at- 
tomole sensitivity. The small volume of the capillary allows 
analysis of sample loadings in the nL and ng range and very 
small amounts of reagent are consumed (in free solution CE 
flow rates of ul/min are used). Moreover, since a wide vari- 
ety of electrolyte buffers and gels can be used in CE, a range 
of separation conditions can be assessed to optimize sepa- 
ration. Table 5.4 summarizes some CE separations ranging 
from atoms to whole cells. 

A range of CE separation conditions may be assessed by 
varying the contents of the capillary. The pH can be changed 
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Example 5.7 Detection of mutant proteins by western blotting. 


Protein kinases and phosphatases (PPs) combine to play a major role in the regulation of cell function by, respectively, 
phosphorylating and dephosphorylating proteins. Three groups of phosphatases (which hydrolyze phosphate groups from 
phospho-amino acids) have been described in eukaryotes differing on substrate specificity (i.e.; protein serine/threonine, 
tyrosine and serine/threonine/tyrosine phosphatases). One of the protein serine/threonine phosphatase enzymes is found 
in two distinct gene products; PP2Ca and £. Five cDNA clones for the latter have been identified which differ only in their 
C-terminal sequence and which also show tissue-specific expression. Crystallographic data and point mutations, however, 
identified a possible N-terminal catalytic domain. It is thought that PP2C isoenzymes may have distinct physiological 
roles in each tissue but the relative importance of the N- and C-terminal domains to this are unclear. 

This question was probed by selectively deleting parts of the PP2Cf sequence from both the N- and the C-terminus 
(A). These mutants were purified from a GST fusion expression system (Chapter 2; The residues Gly - Ser- Pro originate 
from the GST portion of the fusion) and analysed by western blotting (B) and catalytic assay of dephosphorylation (C). 

The western blot (B) carried out with antiserum to PP2Cf showed a single band except in the case of AC(313-390) and 
AC(291—390) where two bands were observed. The blot is an important part of the experiment as it allows confirmation of 
deletion of large parts of the N- and C-terminal domains as evidenced by the greater electrophoretic mobility of mutants. 
Moreover, the presence of two bands for AC(313-—390) and AC(29 1-390) allowed identification of inappropriate thrombin 
digestion of the GST-PP2C8 fusion (Figure 2.19). It is clear from the results in C that the first 12 N-terminal residues 
are essential for catalytic activity while mutations at the C-terminal end had much more modest effects. The inactivation 
observed for AC(291-—390) may well be due to gross structural changes in the protein. 
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(Continued) 


Western blots facilitate the identification of structurally-similar proteins in SDS PAGE separations and are important 
structural probes in studies of proteins. 

Images reproduced from Kusuda et al. (1998) Mutational analysis of the domain structure of mouse protein phosphatase 
2CB. Biochemical Journal, 332, 243-50, by permission of Portland Press. 


to alter electroosmotic flow while changes in the voltage can 
alter both electrophoretic and electroosmotic flow. Agents 
such as ampholytes, SDS and urea can be included in the 
electrophoresis buffer to create denaturing conditions. In- 
corporation of ligands within the capillary can be used to 
estimate ligand affinity of proteins or DNA since the elec- 
trophoretic mobility of a ligand-protein complex will differ 
from both uncomplexed ligand and protein, allowing quan- 
tification of all three species. Incorporation of proteins such 
as BSA in electrophoresis buffers allows chiral separations 
in CE because each of a pair of enantiomers will differ in 
binding affinity for the protein thus giving them different 
electrophoretic mobility. 

CE electrophoretic separation may be combined with 
chromatography (Chapter 2) in a number of ways. CE may 
be interfaced with HPLC chromatography systems to facil- 
itate two-dimensional analysis of complex samples. In such 
systems, samples eluting from HPLC are analysed imme- 
diately by CE. An alternative approach is to pack the CE 
capillary with a suitable stationary phase and to combine 
electrophoretic and chromatographic separation within the 
capillary. Anionic detergents such as SDS may be included 
in the capillary in micellar electrokinetic separations. The 
detergent forms micelles which act as a stationary phase with 
which sample components may interact in different ways. 
In capillary electro-chromatography C-18 reversed phase 
stationary phase is packed into the capillary and electroos- 
motic flow pumps sample through this phase. Again, sample 
components separate based partly on individual affinity for 
C-18 and partly on individual electrophoretic mobility. 

CE provides high-resolution analysis which complements 
techniques such as HPLC and MS and is versatile across the 
whole range of structural complexity found in biochemistry. 


5.10 ELECTROBLOTTING 
PROCEDURES 


Electrophoresis allows separation of complex samples us- 
ing equipment which, in most cases, is cheap and universally 
available and procedures which have been shown to be ro- 
bust and versatile. The resolution attainable in electrophore- 
sis is often better than that obtained in HPLC and MS analy- 
ses and is far superior to techniques such as low performance 


column chromatography. It is sometimes possible to take ad- 
vantage of the high-resolution nature of electrophoresis to 
isolate samples difficult to purify by other means for further 
study. One approach to this is to transfer separated sample 
components onto a mechanically-stable membrane which 
may be manipulated in further analyses. This transfer pro- 
cess is called blotting and, where electrophoresis is the basis 
of transfer, it is termed electroblotting. A method for blot- 
ting DNA restriction fragments was developed by Prof. Ed. 
Southern and called Southern blotting. Subsequently, other 
techniques were developed for blotting proteins (Example 
5.7) and RNA and named after other points of the compass 
to denote the fact that they were all based on a similar exper- 
imental approach (Figure 5.48). Even though some of these 
blotting methods are not electrically-based, they will be de- 
scribed together in this section to clarify the relationship 
between them. 


5.10.1 Equipment Used in Electroblotting 


An outline of an electroblotting experiment is shown in 
Figure 5.49. Electrophoresis may be performed in a va- 
riety of vertical or horizontal gel formats. The separated 
biomolecules are then electrophoresed onto a membrane in 
an electroblotting apparatus. This is achieved by laying the 
gel flat on a piece of membrane and ‘sandwiching’ this be- 
tween two flat electrodes which apply an electrical field at 
right angles to the plane of the gel. The sample components 
electrophorese from the gel and attach irreversibly to the 
membrane by hydrophobic interaction. The electroblotting 
apparatus may consist of a tank in which the gel-membrane 
sandwich is completely immersed in buffer or it may be a 
semidry system where buffer is provided by buffer-soaked 
filter papers which bracket the sandwich. The latter system 
allows more economic use of buffer. The membrane used 
most widely in electroblotting is nitrocellulose. However, 
where further chemical analysis is desired (e.g. N-terminal 
microsequencing of proteins by the Edman degradation), 
more robust membranes such as polyvinylidenedifluoride 
(PVDF) may be used. 


5.10.2 Western Blotting 


If proteins are separated by SDS PAGE, they can be trans- 
ferred to nitrocellulose by electroblotting and stained with 
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Figure 5.48. Apparatus used for capillary electrophoresis. Sample is applied to sample inlet near the inlet reservoir and electrophoresis 
takes place towards the outlet reservoir. The capillary contains buffer (free solution CE) or a gel (gel electrophoresis CE). Because of the 
narrowness of the capillary, heat is dissipated very efficiently allowing the use of very high voltages. 


antibodies essentially as described for the dot blot assay blocking solution, which blocks any nonspecific antibody 
in Figure 5.27. This experiment is called western blotting binding sites. This is followed by washing with an antibody 
(Figure 5.50). Proteins which have been separated by SDS solution (e.g. rabbit IgG). Antibodies bind to epitopes on 
PAGE are transferred to a membrane to which they bind. The proteins. Bound antibodies are visualized by washing with 
membrane is then washed with a protein solution called the a second antibody specific for the class of antibodies to 
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Figure 5.49. Blotting techniques widely used in biochemistry. Southern blotting was originally named after Professor Ed. Southern. The 
other techniques were named after other points of the compass to reflect their overall similarity of approach. 
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Figure 5.50. Outline of electroblotting experiment. Sample is elec- 
trophoresed in a polyacrylamide or agarose gel. It is then transferred 
to a membrane such as nitrocellulose by electrophoresis at right an- 
gles to the original direction. The membrane is then stained to 
identify particular sample components. 


which the first antibody belongs (e.g. antirabbit IgG). These 
second antibodies, which are available commercially, are 
raised to the constant part of the IgG structure and are there- 
fore useful for all antibodies of that class. They are usually 
attached to an enzyme such as horseradish peroxidase and 
their site of binding may be visualized by washing the mem- 
brane with a suitable chromogenic substrate for this enzyme. 
It will be recalled that this visualization approach is similar 
to that described for in situ immunostaining (Section 3.4.6) 
and to dot blotting (Section 5.6.1). 

Because selective staining is achieved with antibodies, 
this technique often helps in clarification of structural rela- 
tionships between proteins. It is useful, for example, in com- 
paring structurally-related proteins such as haemoglobins 
and serum albumins and also in identifying new strains of 
bacteria and viruses. Since we can identify an immunoblot- 
ting protein in a complex mixture separated by SDS PAGE, 
we know the molecular mass of the polypeptide as well as 
its immunological identity. 

Protein preparations used for western blotting need not 
be pure and the experiment can therefore be performed on 
crude preparations such as serum, cell lysates and cell-wall 
extracts. A single protein preparation can be screened with a 
panel of antibodies or, alternatively, a single antibody may be 
used to screen a variety of protein preparations. In the latter 


experiment, the various preparations can be electrophoresed 
side-by-side to allow direct comparison between them. An 
important variable in western blotting is protein transfer 
efficiency. Small and soluble proteins transfer best while 
larger and more fibrous proteins may transfer with very low 
efficiency. For this reason, it is usual to stain the gel after 
blotting to establish how much of each component of the 
sample has successfully transferred to the membrane. 

Since transfer between gel and membrane covers a 
very short distance, it should be complete in a short time 
(<1h). In cases where a particular protein does not transfer 
efficiently, it may be wise to vary the transfer conditions 
rather than extending the transfer time to several hours. 
Useful points of variation include the presence/absence 
of methanol, SDS concentration, ionic strength of transfer 
buffer and current density (mA/cm?). 


5.10.3 Southern Blotting of DNA 


Because of its large size, DNA is often digested with re- 
striction enzymes to generate smaller restriction fragments 
suitable for separation on agarose gels. Since restriction 
enzymes are sequence-specific, the pattern of fragments ob- 
tained is usually characteristic for a particular piece of DNA 
and is called a restriction digest map. It is possible to se- 
quence restriction fragments as described in Section 5.4.2. 
However, where many fragments have been generated, this 
would be very time-consuming. It would be better if se- 
quence relationships could be identified between fragments 
either within a digest or between digests from different sam- 
ples before carrying out direct sequencing. 

Southern blotting (Figure 5.51) allows such comparisons 
by blotting the DNA fragments onto a membrane followed 
by washing with DNA probes complementary to the se- 
quence of interest. We have already seen examples of the 
use of such probes in in situ hybridization (Chapter 3). Nu- 
cleotide bases in the probe hydrogen bond to bases in the 
blotted DNA fragments according to the rules of Watson- 
Crick base pairing (i.e. A-T and G-C). The specific recogni- 
tion offered by base pairing may be regarded as analogous 
to the recognition of individual epitopes by antibodies in 
western blotting. 

After electrophoretic separation of restriction fragments, 
the agarose gel is exposed to alkali to denature the DNA. 
Fragments are transferred to nitrocellulose by capillary ac- 
tion (although electroblotting and vacuum-transfer are also 
possible) as described in Figure 5.52. The DNA is bonded 
to the membrane and then hybridized to a labelled probe 
(which may be an oligonucleotide ora DNA fragment). Only 
DNA fragments with sequences complementary to the probe 
will hybridize. This process is heavily dependent on the 
temperature and salt concentration in which the hybridiza- 
tion is performed. Where a very high degree of sequence 
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Figure 5.51. Western blotting, (a) Proteins are electroblotted onto nitrocellulose and stained. Non-specific binding sites are first saturated 
with BSA or some other protein. The nitrocellulose is then washed with an antibody specific for one or more of the electrophoretically- 
separated proteins. This antibody binds specifically to these proteins on the surface of the nitrocellulose. The membrane is then washed 
with a solution of a second antibody (specific for the first antibody) which is conjugated to an enzyme such as peroxidase. Peroxidase 
converts a substrate into a coloured product allowing visualisation of the location of target proteins on the blot. Note that more than one 
molecule of second antibody can bind to each IgG. (b) SDS PAGE was carried out in three separate, identical gels on a mixture of proteins 
of known identity and mass (track 1) and two individual samples (tracks 2 and 3). Gel A was stained with coomassie blue making all protein 
bands visible. Western blotting was carried out with two separate antibodies on the other two gels. The blot obtained for these are B and 
C, respectively. An immunologically identical band was found in both samples (arrow in B) and a second band also immunoblotted which 
was present only in sample 3. This could be a breakdown product of the common immunoblotting band but is clearly distinct from the 
co-migrating band in sample 2. A different protein is detected by antibody 2. 


complementarity is expected, high stringency conditions 
(i.e. high temperature and/or low salt concentration) are 
used. Where the degree of complementarity is low, low 
stringency conditions (i.e. low temperature and high salt 
concentration) are used. Under the latter conditions imper- 
fect hybrids containing some mismatches can form. The 
length and composition of the probe also affects the strin- 


gency with longer sequences requiring lower stringency. 
The degree of stringency required in a given experiment is a 
matter for experiment and frequently varies between iden- 
tifying too many positives and no positives at all. More de- 
tailed information will be found in the references cited in the 
bibliography but hybridization is usually performed 12°C 
below the theoretical melting temperature of the probe/target 
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Figure 5.52. Southern blotting of DNA restriction fragments. DNA sample is digested with one or more restriction enzymes to generate a 
series of restriction fragments. These are separated on an agarose gel. The fragments are then transferred by capillary action to a nitrocellulose 
membrane (upward arrows). This is hybridised with a labeled probe (which may be an oligonuclcotide or DNA fragment). Positive bands 
are identified and related to bands visible in the original gel. This allows recognition of restriction fragments complementary to the sequence 


of the probe. 


hybrid, Tm, which is calculated from the length (/), and per- 
centage G and C content (% G + C) of the probe as follows; 


Tm = 69.3° + 0.41[%(G + C)] — 650/1 (5.14) 


A typical hybridization temperature is 68 °C but this can 
be reduced to 43 °C by the inclusion of formamide. 

The principal use of Southern blotting is the compar- 
ison of DNA preparations from different sources. Under 
conditions of high stringency, only fragments which are 
almost identical will hybridize to the same oligonucleotide 
probe. Thus very similar pieces of DNA can be identi- 
fied in samples from related bacteria, different tissues (in 
mammals) or different individuals. If a cDNA molecule is 
used as a probe instead of an oligonucleotide, restriction 
digests of genomic DNA can be studied to locate fragments 


corresponding to the cDNA. Polymorphisms in the pattern 
of fragments generated by restriction enzymes are due to 
interindividual differences in DNA sequence and are often 
encountered in genetic diseases. These are called restriction 
fragment length polymorphisms (RFLPs). 


5.10.4 Northern Blotting of RNA 


RNA may also be separated on agarose gels and blotted in 
the same way as DNA fragments and such blots are called 
northern blots. The main difference between these experi- 
ments is that RNA is quite unstable and would hydrolyze in 
the presence of alkali. Accordingly, RNA is denatured before 
electrophoresis by treatment with formaldehyde followed by 
blotting. Nylon membranes or membranes to which RNA 
may be covalently attached are also used in this procedure 
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Figure 5.53. Preparation of polypeptides for further analysis by electroblotting. Polypeptides separable by electrophoresis may be transferred 
to a PVDF membrane. This is stained for protein in aqueous dye. Once bands become visible, the polypeptide of interest is excised and 


subjected to further analysis. 


rather than nitrocellulose. This technique is routinely used 
to demonstrate and quantify expression of particular mRNA 
molecules in cells and may, for example, give size estimates 
for specific mRNAs or reveal how many different types of 
the mRNA may exist. 


5.10.5 Blotting as a Preparative Procedure 
for Polypeptides 


Most of the applications of electrophoresis techniques are 
analytical allowing separation, visualization and sometimes 
quantification of complex mixtures. Because of the high- 
resolution separations possible in electrophoresis, we occa- 
sionally identify polypeptides separable by electrophoresis 
which prove difficult or impossible to purify by conven- 
tional techniques (Chapter 2). In such cases, we can use 
electrophoresis as a preparative procedure. It is possible to 


electroelute DNA and proteins from gels and, in this way, 
to collect sufficient sample for further experimentation. In 
the case of DNA we can clone such molecules into suitable 
vectors while, in the case of proteins, we can perform direct 
chemical and structural analysis. Blotting of polypeptides 
onto membranes (Figure 5.53), however, offers a particu- 
larly attractive way of preparing these molecules for further 
analysis since the membrane may act as a solid phase allow- 
ing high-efficiency removal of contaminants such as buffer 
components. This also minimizes loss of sample during han- 
dling and facilitates analyses such as N-terminal microse- 
quencing and MS analysis (Chapter 4) on as little as a few 
ug of material. Because nitrocellulose has limited chemi- 
cal stability to corrosive chemicals such as trifluoroacetic 
acid, it may be replaced by vinyl-based membranes such 
as PVDF. Another advantage offered by PVDF is the fact 
that this membrane does not bind protein stains such as 
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coomassie blue while nitrocellulose does. It is consequently 
possible to visualize stained protein bands against a white 
background using PVDF (Figure 5.53) while nitrocellulose 
membranes stain nonspecifically making such visualization 
impossible. 

Using electroblotting, therefore, we can obtain detailed 
chemical and structural information on polypeptides with- 
out the prior necessity of obtaining them quantitatively as 
highly-purified preparations. 


5.11 ELECTROPORATION 


Electric fields can have important effects on charged 
biomolecules. Most applications of such fields in biochem- 
istry involve in vitro separations. There are relatively few 
examples of the use of electric fields on living cells. Elec- 
troporation is a technique which takes advantage of the 
effects of high voltages on cell membranes to facilitate the 
uptake of foreign genetic material. This process is called 
transformation. 


5.11.1 Transformation of Cells 


When a gene is cloned into a suitable vector, it is often 
desired to insert this vector into a suitable bacterial, fungal, 
plant or animal host-cell. This would generate a genetically- 
altered cell which is said to be transformed. If the vector is 
simply mixed with the host-cell, very few, if any, of the 
cells take up vector because the cell membrane acts as an 
efficient barrier against such foreign material. A wide vari- 
ety of biological, chemical and physical methods have been 
developed to increase the uptake efficiency of transformed 
cells (Figure 5.54). Biological approaches include the use of 
bacterial, animal and plant viruses or parts of viruses in the 
vector. Viruses recognize receptors on the surface of cells 
and are internalized from these receptors. The most widely- 
used chemical method involves making cells competent to 
take up vectors by treatment with CaCl,. Physical methods 
for introduction of vectors include microinjection, firing mi- 
croscopic tungsten pellets coated with genetic material as 
microprojectiles into blocks of cells and electroporation. 


5.11.2 Physical Basis of Electroporation 


Electroporation involves exposing cells to a pulse of high 
voltage electricity. It is thought that this causes reorienta- 
tion of molecules in the cell membrane making it transiently 
porous to vectors. The technique is especially useful in trans- 
forming eucaryotic cells. When dealing with plant cells, the 
cell wall is first removed by enzymatic digestion to form a 
protoplast. 
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Figure 5.54. Transformation of cells. Cells may be transformed by 
a variety of biological, chemical and physical methods to increase 
the uptake efficiency for foreign DNA. Electroporation and CaClz 
treatment make the cell membrane transiently porous to vectors 
(circles). 
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Chapter 6 


Three-Dimensional Structure 
Determination of Macromolecules 


Objectives 


e The protein folding problem. 


After completing this chapter you should be familiar with: 


° Determination of three dimensional structures of biomolecules by multi-dimensional NMR. 
e How crystals of biomolecules may be obtained. 
e Determination of three dimensional structures of biomolecules from X-ray diffraction experiments. 


The chemical and biological properties of biomolecules (es- 
pecially biomacromolecules such as proteins and DNA) are 
determined mainly by their molecular structure. This un- 
derlies important processes such as enzyme catalysis, bind- 
ing of ligands at protein receptors and correct expression 
of genetic information at the level of protein. Even mod- 
est changes in structure (e.g. arising from point mutations 
or from interaction with environmental chemicals) can re- 
sult in molecules with drastically altered biological activity. 
Several techniques have been presented elsewhere in this 
text which yield information on the approximate structure 
of biomolecules such as their quaternary structure, over- 
all shape/mass and secondary structure (e.g. Chapters 3, 4, 
5, 7 and 10). These techniques often give us crucial infor- 
mation on the importance of particular structural domains 
for function and allow comparison of normal with aberrant 
molecules. However, in order to understand the detail of 
how biomolecules function, we need high-resolution struc- 
tural information (i.e. at ~2A resolution). In practice, this 
information is particularly difficult to obtain for the two 
most important classes of biomacromolecules (proteins and 
nucleic acids) and macromolecular complexes composed of 
mixtures of these such as ribosomes and viruses. 

Since proteins adopt a much wider range of tertiary (i.e. 
three-dimensional) structures than other biomacromolecules 
and frequently mediate the functioning of other classes of 
biopolymers, this chapter will concentrate in particular on 
protein structure. We will first examine how proteins come 
to have a particular shape and then describe the two main 
experimental techniques which give us high-resolution in- 
sights into protein structure; multi-dimensional NMR and 
X-ray crystallography. These techniques are independent of 


each other and have the major difference that the appli- 
cability of NMR is in practice limited to relatively small 
proteins. However, as will be described later (Section 6.7) 
these methods also complement each other because NMR 
gives information relevant to molecular structure in solution 
while X-ray crystallography gives structural information on 
molecules in the solid (i.e. crystalline) phase. It is impor- 
tant to understand that processes similar to those which de- 
termine protein structure also underlie structure formation 
in nucleic acids and macromolecular complexes. Moreover, 
both X-ray crystallography and multi-dimensional NMR are 
just as applicable to structural investigations in other classes 
of biomolecules as in proteins. 


6.1 THE PROTEIN-FOLDING 
PROBLEM 


Proteins fold up into compact, bioactive shapes or confor- 
mations largely as determined by their primary structure 
(amino acid sequence). In computer parlance, we could re- 
gard primary structure as a sort of ‘program’ encoding a 
particular three-dimensional structure. For this reason pro- 
teins are informational biomolecules in which alterations 
in primary structure (i.e. mutations) often result in an al- 
tered three-dimensional structure lacking normal function. 
Understanding the consequences of such mutations has led 
to knowledge of the biochemical basis of many genetic dis- 
eases as well as the prospect of eventual gene therapy for 
treating such conditions. Moreover, because we can now 
readily alter primary structure, the possibility exists of cre- 
ating completely new proteins such as enzymes or antibodies 
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designed for specific biotechnological applications. A ma- 
jor limitation to advance in this new field of rational protein 
design is our lack of understanding of precisely how pri- 
mary structure determines tertiary structure. This is referred 
to as the protein-folding problem. This section describes the 
process of protein folding leading to a particular tertiary 
structure. 


6.1.1 Proteins are only Marginally Stable 


Proteins show great structural and functional complexity. 
They vary widely in size ranging from a few thousand to 
several million Da and are found in both hydrophobic and 
hydrophilic environments (e.g. biological membranes and 
cytosol, respectively). Proteins are also chemically com- 
plex and are specialized for a wide variety of cellular func- 
tions such as transport, defense and catalysis. This structural 
and chemical diversity arose by variation of the sequence 
and composition of only twenty amino acids which are the 
building blocks of proteins. In the early years of the twen- 
tieth century it was believed that proteins were essentially 
random colloids which acted as hosts for smaller bioac- 
tive components thought to be responsible for processes 
such as catalysis. The application of X-ray crystallography 
to biomolecules such as myoglobin and pepsin, however, 
revealed that proteins have a definite three-dimensional or 
tertiary structure which underlies their function. 

In the 1950s Christian Anfinsen (Nobel prize in Chem- 
istry, 1972) and his colleagues demonstrated that tertiary 
structure is a consequence of protein primary structure and 
that small proteins fold up into compact, bioactive struc- 
tures in a manner determined largely by amino acid se- 
quence alone. Anfinsen showed that, when ribonuclease is 
denatured by treatment with moderate heat or urea, it spon- 
taneously renatures on subsequent cooling or removal of 
urea. Later, Bruce Merrifield (Nobel prize in Chemistry, 
1984) chemically synthesized a polypeptide with identi- 
cal sequence to ribonuclease and found that it folded up 
spontaneously such that its physical and catalytic proper- 
ties were indistinguishable from the natural enzyme. Thus, 
primary structure does not alone determine the ultimate ter- 
tiary structure but also in some way apparently directs the 
folding of the protein into that structure. We now know (see 
below) that some large proteins often require assistance from 
other proteins to fold into their correct structure in vivo but 
Anfinsen’s insight reveals that a particular three- 
dimensional shape is a natural consequence of sequence. 

Proteins are compact structures with the interior having 
a density similar to that of a protein crystal. We can regard 
the interior as being composed mostly of a tightly folded 
polypeptide chain from which mainly hydrophobic amino 
acid side-chains radiate to occupy any available space. Pro- 
teins are also heavily solvated (40-60%) with water and, 


in cases where hydrophilic residues are enclosed in the in- 
terior of the protein, they are usually hydrogen-bonded to 
molecules of structural water. These are water molecules 
which form an integral part of the structure of the protein. For 
example, in the case of bovine pancreatic trypsin inhibitor, 
four water molecules are invariably found in specific loca- 
tions and steric arrangements in the interior of the protein. 
We can think of structural water as providing a low-energy 
option for internalizing hydrophilic side-chains thus extend- 
ing folding possibilities available to the polypeptide chain. 
The folding of even a small protein is a highly efficient pack- 
aging process which takes place remarkably quickly with, 
in the majority of cases, minimum intervention by other 
molecules. 

This folding process is accompanied by extensive des- 
olvation of the polypeptide backbone in which hydrogen 
bonding contacts between water and amide C=O and N-H 
groups are lost. A consequence of this is that such groups 
then interact with other amide groups by intrachain or inter- 
chain hydrogen bonds such as (C=O. ..H-N) (Chapter 1; 
Figure 6.1). This is probably the reason why, in proteins for 
which three-dimensional structures have been determined 
by X-ray crystallography, some 50% of amino acids (ex- 
cept glycine and alanine) adopt dihedral angle values close 
to those of a-helix and some 40% adopt values close to 
those of 6-sheet. These backbone geometries facilitate des- 
olvation by providing nonwater hydrogen bond partners for 
amide groups. 

The surfaces of proteins are more mobile than the interior 
and consist mainly of hydrophilic side-chains protruding 
from the protein. These interact with solvent water by hy- 
drogen bonding forming a solvation shell around the protein 
(Chapter 1). The accessible surface area of globular proteins 
(As) is approximately proportional to its molecular weight 
(M) as follows: 


Ag = 11/12 M°’? (6.1) 


This value is nearly twice that which would be calculated 
from the geometry of a smooth sphere showing that the 
surface of a protein is in fact quite rough. Studies with 
extrinsic fluors suchas ANS (Table 3.4; Section 3.4.4) reveal 
that many proteins have pockets on their surface to which 
such fluors can bind which are more hydrophobic than the 
average for the surface. Such pockets may provide ligand 
binding sites for the protein and they may also interact with 
hydrophobic stationary phases in hydrophobic interaction 
chromatography (Chapter 2). 

Although proteins from microorganisms specialized for 
extreme environments such as thermophilic bacteria are par- 
ticularly heat-stable, in general proteins are of only marginal 
stability. Comparatively modest increases of temperature or 
pressure can result in complete unfolding or denaturation of 
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Figure 6.1. The main forces involved in protein folding. Stabilizing forces include (a) van der Waals interactions. These are short-range 
attractive forces between any pair of atoms due to transient magnetic dipoles which they induce in each other. North (N) and south (S) 
poles of the dipoles are shown. If two atoms come too close together, repulsion between electron clouds can overcome this attraction. Each 
atom has a van der Waals radius and the sum of these radii for a pair of atoms is the van der Waals contact distance. (b) The hydrophobic 
effect. Hydrophobic amino acids such as phenylalanine ‘prefer’ to be buried in the interior of a protein rather than be exposed to solvent 
water. Repulsion of water by phenylalanine is shown (””). (c) Electrostatic interactions. Hydrogen bonds stabilize interactions between polar 
side-chains and solvent water. Salt bridges arise from electrostatic attraction between side-chains of opposite charge. The main destabilizing 


forces are (d) Chain entropy. Many more conformations are available for the protein in the unfolded state than in the folded state. (e) Bond 
strain due to unfavourable interactions in folded state (e.g. constraint into a cis rather than the more favoured trans configuration). 


Table 6.1. Estimates of net magnitudes of stabilizing and desta- 
bilizing forces in protein folding. 


the polypeptide chain. For reasons which will be explained 
in the next section, for many single-domain small proteins 
we can represent protein denaturation (and, by analogy, pro- 


tein folding) as a two-state system: : ; . AG at 25°C 

Interaction (protein of 100 residues) (kcal/mole) 
N2U (6.2) Destabilizing 

Chain entropy 330-1000 

where N and U represent the native and unfolded (i.e. de- Unfavourable folding interactions 200 

natured) forms of the protein. Stabilizing 

Adoption of a particular folded structure seems to be the Increased van der Waals bonds due to —227 

result of a delicate balancing act between large stabiliz- close packing 

ing forces and almost as large destabilizing forces (Figure Hydrogen bonds —49 to —719 

6.1; Table 6.1). A relatively weak stabilizing force is pro- Hydrophobic effect —264 
Disulfide bridges —4 


vided by van der Waals interactions (also sometimes called 
London dispersion forces) which occur between pairs of 
nonbonded atoms brought close together in space as a result 
of attraction between transient magnetic dipoles which they 
induce in each other. These forces are additive and, while not 


Note that stabilizing forces contribute a negative AG while desta- 
bilizing force AG values are positive. 


quantitatively important in small molecules, they represent 
a significant contribution to stabilization of large molecules 
such as proteins. If two atoms come closer together than the 


sum of their individual van der Waals radius, (referred to as 
their van der Waals contact distance) repulsive forces ex- 
ceed attraction. In fact, most atoms in the interior of proteins 
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are in van der Waals contact with other atoms and cannot be 
packed more densely due to the limitation imposed by their 
van der Waals contact distance. 

The hydrophobic effect is another important stabilizing 
force in proteins. This arises from folding of the protein to 
minimize exposure of hydrophobic residues to solvent wa- 
ter. Strongly hydrophobic side-chains (e.g. phenylalanine) 
can only be dissolved in water at considerable energy cost 
because they would strongly disrupt the hydrogen-bonded 
structure of the solvent. Moreover, the hydrophobic effect 
is largely entropic (rather than enthalpic) since removal of a 
hydrophobic residue from water releases solvent water from 
the constraints it had imposed leading to a net increase in 
solvent entropy. Packing of the polypeptide chain to remove 
such residues from contact with water is therefore strongly 
thermodynamically favoured. 

The last important group of stabilizing forces are electro- 
static interactions which arise from electrostatic attraction 
between either partial charges arising from the differing 
electronegativity of atoms (e.g. 6~, 6+; Chapter 1) or full 
charges arising from ionization of residue side-chains at 
physiological pH (e.g. acidic and basic residues). Electro- 
static attraction between partial charges (i.e. 5” — — — ôt) 
permit formation of low-energy, noncovalent bonds through 
hydrogen atoms called hydrogen bonds. Such bonds facil- 
itate binding of the solvation shell of water at the protein 
surface and are also responsible for binding of structural 
water. Attraction between oppositely charged side-chains 
(e.g. Asp which is negatively charged and Lys which is 
positively charged) allows formation of salt bridges which 
are frequently found to stabilize folding of sequentially dis- 
tant parts of the polypeptide chain. (This is a way of ‘bury- 
ing’ charged side-chains at lower thermodynamic cost to the 
protein). 

Further stability can be introduced by post-translational 
formation of disulfide bridges. These are covalent disulfide 
(i.e. -S—S—) bonds formed between sulphydryl (—SH) side- 
chains of cysteine residues which may be on the same or dif- 
ferent polypeptides. They are frequently found in proteins 
destined for secretion from the cell and stabilize otherwise 
unstable folded structures by decreasing conformational 
entropy. 

The principal destabilizing forces in proteins are due to 
chain entropy and unfavourable structures adopted by the 
polypeptide as a result of folding. Chain entropy represents 
the energy ‘cost’ in constraining the folded polypeptide into 
a single or small number of conformations rather than the 
greater number of conformations possible for the unfolded 
protein. Unfavourable interactions are structural features 
adopted within the folded protein which are of higher en- 
ergy than the arrangement which would normally occur in 
the unfolded polypeptide. These may represent electrostatic 
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Figure 6.2. Folded proteins are highly-sensitive to temperature. 
AG values (i.e. Gfolded — Gunfolded) for some small proteins in the 
temperature-range 10-50°C. Temperature affects individual stabi- 
lizing/destabilizing interactions in a complex way which may result 
in the protein being more or less stable at higher temperature (e.g. 
compare ribonuclease and myoglobin at 30° and 50°C). However, 
most proteins are unstable at high temperatures. Note the small 
magnitude of the energy difference which maintains the protein in 
a folded state. 


repulsion between side-chains or constraining the polypep- 
tide chain into an unfavourable arrangement (e.g. cis peptide 
bonds may occasionally occur rather than trans). 

The relative size of stabilization and destabilization forces 
results in a difference in Gibb’s free energy (AG) between N 
and U which is slightly negative, that is the native structure 
is thermodynamically favoured (Figure 6.2). Relative to the 
magnitude of stabilizing and destabilizing forces, however 
(Table 6.1), AG values for small proteins are quite mod- 
est (—5 to —20 kcal/mole at room temperature). Variation 
of temperature can alter the delicate balance between stabi- 
lization and destabilization, resulting in an unfolded struc- 
ture having a lower value of G at high temperature than 
the folded structure. This is the reason why a protein which 
is structurally stable at 50°C may become denatured when 
exposed to a temperature of 80°C for even a few minutes. 

Protein folding is therefore a thermodynamically sponta- 
neous process (i.e. folding causes a decrease in G) which 
allows the bulk of the molecules of a protein to adopt a 
single or small number of stable structures under physiolog- 
ical conditions. Preference for the folded state over the un- 
folded is marginal and is dominated by low-energy interac- 
tions (hydrophobic effect, salt bridges, etc.). The most likely 
biological rationale for this is that it allows proteins to be 
assembled and dismantled without a requirement for input 
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or release of large amounts of energy. A consequence of this 
is that the bioactive conformations of proteins are fragile 
structures compared, for example, to synthetic polymers or 
low molecular weight organic/inorganic compounds. Meth- 
ods for determination of three-dimensional structure may be 
severely constrained by this fragility. 


6.1.2 Protein Folding as a Two-State Process 


The model of protein folding shown in Equation (6.2) sug- 
gests that protein folding may be represented as a simple 
two-state process where essentially all the protein exists ei- 
ther as a completely unstructured polypeptide (U) capable 
of adopting many thousands of individual conformations 
or as a highly structured native molecule with a single or 
small number of conformations (N). This model simplifies 
the process of protein folding significantly because it takes 
no cognisance of the possibility of stable intermediates be- 
ing formed during folding. Molten globule intermediates 
in which stable, native-like secondary structure is present 
are known to form during folding of cytochrome C and ri- 
bonucleases and such proteins have been designated class 
I. Proteins which fold by a two-state process as shown in 
Equation (6.2) and which do not form measurable, stable, in- 
termediates such as chymotrypsin inhibitor 2 and cold shock 
protein B have been designated class IT. It seems that class 
I proteins fold by a hierarchical process in which structural 
information is delocalized over the protein as a whole and in 
which tertiary structure interactions with parts of the protein 
distant in sequence stabilize secondary structure in measur- 
able intermediates. This may explain why mutations often 
introduce instability to proteins but only rarely alter their 
overall fold. Class II proteins, by contrast, fold very rapidly 
based largely on local interactions such as the formation of 
secondary structure (Example 6.2). 

The model proposed in Equation (6.2) gives a good quan- 
titative approximation of the bulk of the population of a small 
single-domain class II protein at any stage of the folding pro- 
cess. What evidence do we have for such a simple model? 
Protein folding may be studied by first denaturing the pro- 
tein (e.g. with heat or a chaotropic agent such as urea) and 
then allowing it to renature after removal of the denaturing 
treatment (e.g. cooling or removal of urea). Alternatively, 
we can study protein unfolding (i.e. N — U) rather than 
directly studying protein folding (U —> N) since it is likely 
that a protein will follow similar energetic and kinetic path- 
ways to unfold as to fold. Protein unfolding may be induced 
by a wide variety of treatments such as exposure to extremes 
of heat, pH or pressure, or treatment with denaturants such 
as urea or detergents and can be monitored with a range 
of spectroscopic and electrophoretic methods (Chapters 3 
and 5). A typical denaturation profile for a protein is shown 
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Figure 6.3. Protein denaturation profile of ribonuclease A. 
Temperature-induced denaturation was determined spectroscopi- 
cally and is expressed as the fraction of unfolded (U) protein 
present. The situation when this fraction is 0.5 is described in the 
box. 


in Figure 6.3 and one of the interesting observations we can 
make from such experiments is that, regardless of the denat- 
uration treatment employed, most small proteins appear to 
denature completely over a remarkably narrow range of tem- 
perature, pH or denaturant concentration. This suggests that 
protein unfolding is a rapid and highly cooperative process. 

Spectroscopic techniques such as absorbance, fluores- 
cence and circular dichroism can be used to monitor fold- 
ing/unfolding and to calculate the fraction of protein present 
in the U state at any particular temperature/pH/denaturant 
concentration. This fraction will be 0 when the protein is 
fully folded, 0.5 when 50% denatured and 1 when fully de- 
natured. An equilibrium constant for 6.2 can be calculated 
from such measurements since 


_ WI] 


= — 6.3 
[U] (6.3) 


We would expect this equilibrium constant to be related 
to AH, the enthalpy change on unfolding by the van’t Hoff 
relationship as follows: 


dln K 
dT 


AH = RT? 


(6.4) 


where R is the universal gas constant and T is the abso- 
lute temperature of the sample. When enthalpy changes 
associated with unfolding of small proteins are directly mea- 
sured by calorimetric measurements (Sections 8.3 and 8.4), 
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it is found that they agree well with those arising from the 
van’t Hoff relationship. In fact, the ratio between the two 
AH values is 1.05 suggesting that only some 5% of pro- 
tein molecules are in a state other than the N or U states at 
any temperature. This body of evidence strongly suggests 
that protein folding is described well by a simple two-state 
model such as that shown in Equation (6.2) although a small 
amount of the protein exists in states other than N or U. 
What is this small amount of protein doing during protein 
folding? 


6.1.3 Protein-Folding Pathways 


A possible thermodynamic interpretation of protein fold- 
ing is to think of the folded conformation as occupying a 
lower free energy value than any of the unfolded conforma- 
tions. It could be imagined that the protein might sample all 
possible conformations until the lowest energy conforma- 
tion was randomly arrived at. This is known as Levinthal’s 
paradox and it may be explained by examining folding of 
a 100-residue protein. If it was assumed that each residue 
could adopt ten possible conformations (which is much less 
than the number of likely conformations available to most 
residues) then; 


Total number of conformations = 10!% (6.5) 


If the time allowed for transition between each individual 
conformation was assumed to be 107" s then; 


Time required to sample all conformations = 10” years 
(6.6) 


Since we know that proteins in fact fold on a time-scale of 
107! to 10°, it is not likely that proteins randomly sample 
all possible conformations. 

Thermodynamics does not offer a full description of re- 
action processes. We also have to consider the pathway 
through which the reaction moves, that is the kinetics of the 
process. Levinthal’s paradox is explained by the fact that, 
although there may be an astronomically large number of 
possible conformations each with an individual value of G, 
not all of these are kinetically accessible and therefore the 
search for a folded structure would be expected to be biased 
towards conformations which are accessible. The protein 
could therefore be expected to fold in a biased way along a 
definite pathway rather than by unbiased sampling. Protein 
folding may be viewed as the movement from a relatively 
extended and disordered structure of a given G value to a 
far more compact and ordered structure of lower G through 
a folding pathway (Figure 6.4). This pathway might con- 
tain local energy minima which would represent folding 
intermediates as well as higher-energy structures which we 
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Figure 6.4. A protein folding pathway. As protein folds, the com- 
pactness of the structure increases and free energy (G) decreases. 
The pathway is defined by intermediate structures (J) which have 
intermediate compactness and free energy, occupying a local en- 
ergy minimum. Although only one is shown, there may be several 
of these in a given pathway. Transition states (TS) are high-energy 
structures which provide energy barriers to the folding pathway. 
The final folded structure (N) occupies a global energy mini- 
mum and is the most compact structure. The AG between the 
U and N states provides the thermodynamic driving force for 
folding. 


can regard as transition states. The local energy minima of 
folding intermediates will be of intermediate G values be- 
tween the U and N conformations of the protein. The small 
amount of protein not in the N or U states at any given time 
during protein folding is present mainly as folding interme- 
diates along the folding pathway. 

Based on a range of experimental studies, several models 
have been proposed to explain protein folding. Usually these 
studies have focused on small proteins such as lysozyme, 
ribonuclease and bovine pancreatic trypsin inhibitor. Three 
classical mechanisms for folding have been proposed (Fig- 
ure 6.5). The framework model postulates that secondary 
structure forms independently of tertiary structure and that 
diffusion and coalescence of regions of secondary structure 
then go on to form tertiary structure (this is also some- 
times called the diffusion-collision model). The hydropho- 
bic collapse model suggests rapid collapse of the protein 
around hydrophobic side-chains and that secondary and ter- 
tiary structures form in the consequently restricted confor- 
mational space. Both of these models are consistent with 
the formation of measurable intermediates in the folding 
pathway. The nucleation model postulates that localized re- 
gions of native secondary structure act as nuclei from which 
correct tertiary structure is propagated. Like the framework 
model, this proposes formation of tertiary structure as a 
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Figure 6.5. Models of protein folding. (a) Framework model; lo- 
cal elements of secondary structure form independently of tertiary 
structure. (b) Hydrophobic collapse model; protein chain rapidly 
collapses around hydrophobic side-chains. Secondary and tertiary 
structure form in restricted conformational space around this. 
(c) Nucleation model; secondary structure elements act as nuclei 
around which tertiary structure forms. (d) Nucleation-condensation 
model; A weak central nucleus is first formed followed by stabi- 
lization with sequentially distant residues leading to a larger and 
more stable nucleus. This condenses with the rest of the structure 
until folding is complete. 


consequence of secondary structure but it does not necessi- 
tate formation of intermediates. 

More recent work involving a combination of mutational 
analysis, computer simulations using molecular dynamics 
and two-dimensional NMR (Section 6.2) has led to the pro- 
posal that a nucleation—condensation model may explain the 
folding of small proteins (Figure 6.5). This postulates that 
a weak central nucleus is formed consisting primarily of 
adjacent residues. The nucleus is stabilized by long-range 
interactions with residues distant in the sequence to form 
a progressively larger and more stable nucleus. This nucle- 
ation process is closely coupled to condensation of the rest 
of the structure around the stable nucleus until the complete 
structure is formed. 

It is unlikely that any one model of protein folding fully 
explains all cases. Every protein is an individual molecule 
and the details of folding are likely to be unique to each. It 
is probable, though, that formation of secondary structure, 
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Figure 6.6. A protein folding funnel. The funnel represents incre- 
mental changes in conformation as the protein folds from U to N. 
Entropy (S) decreases as the number of conformations available 
reduces. The parameter Q j reflects gradual forming of the correct 
structural contacts in N. Experimentally identifiable species such 
as molten globules, folding intermediates and transition states are 
shown. 


nucleation and condensation are frequent occurrences in 
most protein-folding processes. 

An alternative to the classical image of the folding path- 
way is the folding funnel (Figure 6.6). This moves away from 
the concept of a folding pathway and instead represents the 
folding of a protein along a funnel formed by two variables; 
entropy, S, which is expected to decrease on folding and 
a parameter Q; (which is not linear with free energy, G) 
which is given by 


Number of pairwise contacts in state i 
i — 


(6.7) 


Number of contacts in state N 


State i is any individual conformation along the folding 
funnel. Q; varies between the unfolded state, U (where 
it has a value of 0), to a maximum value of 1 in the 
native state, N. Thus, this parameter is expected to in- 
crease on folding at the same time as the value of S$ de- 
creases. In this representation, localized energy minima 
which in a pathway are designated as folding intermedi- 
ates instead are represented as folding traps. In other words 
some proteins in the population may occupy this localized 
minimum but others may not. Folding traps are therefore 
not obligatory in the sense that intermediates in a folding 
pathway are. 

The power of this representation of protein folding is that 
it allows us to explain Levinthal’s paradox. Distinct struc- 
tures for which experimental evidence have been obtained 
such as molten globules, folding traps and transition states 
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occupy distinct parts of the funnel but progression along the 
funnel arises from small conformational changes across a 
large continuum of values for Q; and S. This representa- 
tion has the advantage that we can imagine the search for 
a lower free energy as a driving force through the funnel 
but that aspects of the kinetics of the process such as for- 
mation of intermediate and transition states are also readily 
visualized. 

The protein folding problem is to understand how primary 
structure directs folding of a protein into a particular tertiary 
structure. The main practical difficulty in solving this prob- 
lem is that we lack sufficient experimental tools to ‘freeze’ 
all forms of the protein along the folding pathway since 
many of them have only a transient existence. For this rea- 
son we are limited to constructing models of folding which 
are based largely on quasi-stable structures such as folding 
intermediates and molten globules for which we have direct 
experimental evidence. We need to infer the existence of 
unstable species such as transition states and steps along the 
pathway by interpolating between such quasi-stable struc- 
tures. Even though most studies on protein folding have 
concentrated on small proteins, it is thought that the same 
physical principles govern folding of larger proteins. Be- 
cause of the size and complexity of the folding problem in 
larger polypeptides it has been discovered that some 10% of 
proteins require additional proteins to fold correctly in vivo. 
These are referred to as chaperones because they ‘chaper- 
one’ the protein into the correct folded structure. 


6.1.4 Chaperones 


Chaperones are required in vivo because exposure of hy- 
drophobic residues on nascent or partially unfolded polypep- 
tide chains to the aqueous environment of the cytosol could 
lead to aggregation and formation of disordered protein pre- 
cipitates likely to be toxic to the cell. This is especially 
likely to be important in eukaryotes where proteins are sig- 
nificantly larger and often consist of several domains. The 
‘crowding’ effect of high cellular protein concentration must 
also contribute to a tendency towards mis-folding. Forma- 
tion of aggregates or incorrectly folded proteins could also 
have serious consequences for the rate of protein synthesis 
(since they would slow it down) and its efficiency (since un- 
desirable variability would be introduced). Mis-folding can 
occur with nascent polypeptide chains while still attached to 
the ribosome. Once synthesized, some large proteins require 
a finite amount of time to fold correctly and aggregation be- 
tween unfolded proteins is a possibility during this period. 
In any case, it is known that unfolded polypeptides do not 
survive very long in the cell as they are rapidly degraded by 
proteases. Therefore, the cell requires a system which will 
allow sufficient time for correct folding whilst simultane- 
ously protecting unfolded proteins from aggregation or pro- 


teolysis. Two general categories of proteins prevent protein 
mis-folding. Chaperones are generally small proteins that 
protect nascent polypeptide chains from aggregating before 
the chain is completely synthesized. Chaperonins are larger 
oligomeric proteins which often form large cylinders within 
which completely synthesized though unfolded proteins can 
complete folding. 

Chaperones and chaperonins were originally discovered 
as heat shock proteins in eukaryotic cells, that is proteins 
expressed in response to short periods of heating. It is most 
likely that they are induced because of the risk of thermal 
denaturation of proteins in the cell which would tend to 
lead to exposure of hydrophobic residues and consequent 
aggregation. They were designated Hsp followed by their 
apparent kDa molecular mass (e.g. Hsp 10) and their role 
in protein folding was only discovered subsequently. Chap- 
eronins were first identified in bacteria as essential host 
factors necessary for assembly of bacteriophages. We now 
know that chaperonins are a widespread family of highly 
conserved proteins which are present in bacteria and eu- 
karyote cytosol, mitochondria, endoplasmic reticulum and 
chloroplasts. The following description focuses on the main 
chaperones and chaperonins (exemplified by Hsp60 and Hsp 
70, respectively) of E. coli but there are several eukaryotic 
homologues showing close sequence, structural and func- 
tional similarities to these proteins. 

Hsp 60 and Hsp 70 are mechanistically distinct and have 
been extensively studied. These proteins are not substrate- 
specific, that is they bind to a wide range of different un- 
folded proteins with approximately the same affinity. They 
both work in conjunction with other proteins called cochap- 
erones which are specific for each chaperone but which can 
distinguish between ATP-bound and ADP-bound structural 
variants of them. Both chaperones are ATP-ases and ATP- 
hydrolysis provides energy for allosteric transitions within 
Hsp 60 and Hsp 70 which the unfolded proteins experience 
as alternating hydrophobic and hydrophilic surfaces. This 
close coupling of ATP-hydrolysis to allosteric alterations in 
chaperones is crucial to their correct function. The effect of 
these changes in structure is gradually to ease the unfolded 
protein into its correct folded structure by successive cycles 
of binding to the chaperone. Not all the details of the precise 
mechanism of chaperones have yet been completely estab- 
lished but the following is an overview of the main features 
of Hsp 70 and Hsp 60. 

Hsp 70 is also called DnaK and is specialized for binding 
to short sequences enriched in hydrophobic residues such 
as are found in nascent polypeptides being synthesized on 
ribosomes (Figure 6.7). This chaperone can therefore func- 
tion cotranslationally, that is at the same time as translation. 
DnaK can exist in two structurally different forms; an ATP- 
bound form which has high affinity for unfolded substrate 
protein and an ADP-bound form which has low affinity. 
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Figure 6.7. Functioning of Hsp 70 (DnaK) in folding of nascent polypeptides. (a) DnaK binds to hydrophobic stretches of nascent 
polypeptide chains during protein synthesis. (b) The co-chaperone, DnaJ, binds to DnaK, forming a ternary complex. This stimulates the 
ATP-ase activity of DnaK resulting in ATP being hydrolyzed to ADP. Note that the structure of DnaK-ADP differs from that of DnaK-ATP 
(c) A further protein, GrpE binds to the ternary complex, causing release of ADP and binding of ATP at DnaK. (d) The complex dissociates 
from the polypeptide. Note that several cycles of binding and dissociation can be gone through before peptide synthesis is complete. 
Moreover, several hydrophobic sites on a single polypeptide can act as substrates for the Hsp70 system. 


It forms a ternary complex with hydrophobic stretches of 
the as yet incomplete polypeptide chain and another pro- 
tein called DnaJ (Hsp 40) which stimulates the ATP-ase 
activity of DnaK. DnaJ is in fact a chaperone in its own 
right and has folding activity even in the absence of DnaK. 
A further protein (GrpE) can bind in turn to this complex 
causing the release of ADP from DnaK followed by binding 
of ATP. The complex now dissociates from the polypeptide 
chain and several cycles of binding and dissociation can be 
gone through before the polypeptide is complete and re- 
leased from the ribosome. Most proteins fold independently 
of chaperones while some only require the aid of the Hsp 
70 system to fold completely. A third group of proteins, 
however, do not fold into their native structure even with the 
help of the Hsp 70 system. 

This third group of polypeptides now passes to the sec- 
ond main class of chaperones, the chaperonins, the principal 
component of which is Hsp 60 (also known as GroEL). This 
is composed of 58 kDa subunits of which 14 are arranged 
in the form of a hollow cylinder made up of two rings of 
seven (Figure 6.8). These rings are arranged head-to-head 
and enclose a cavity calculated to have an average diam- 
eter of 45A and a length of approximately 146 A which 
would be capable of accommodating a folded polypeptide 


of maximum mass 147 kDa (assuming perfect packing). In 
practice, the maximum unfolded structure likely to be ac- 
commodated is approximately 55 kDa. Like Hsp 70, GroEL 
has ATP-ase activity and, in the absence of ATP has high 
affinity for unfolded polypeptides. Although not visible in 
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Figure 6.8. Organization of Hsp 60 chaperonin. Fourteen 58 kDa 
GroEL subunits are arranged in two circular rows of seven as shown. 
This forms a cavity within which folding can occur. Both the indi- 
vidual subunits and the complex they form are referred to as GroEl. 
Each subunit consists of three distinct structural domains; apical, 
intermediate and equatorial (which contain the ATP-ase site). Each 
equatorial domain also contains a ‘window’ sufficiently large for 
small molecules to diffuse freely through. 
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Figure 6.9. A vertical view of structural events in the GroEI/GroES system during protein folding. A transverse section of the upper (cis) 
and lower (trans) rings of seven 58 kDa subunits are shown (for clarity, two subunits from each ring are represented). (1) Unfolded protein 
binds strongly to GroEL (see also Figure 6.10). ATP hydrolysis in the trans ring causes binding of GroES to the apical domain and ATP 
binding to the seven equatorial domains of the cis ring of 58 kDa subunits (note enlarged cavity in cis ring). Simultaneously, GroES and 
folded protein are released from the trans ring. Conformational changes result in unfolded protein being released into the hydrophilic 
‘Anfinsen cage’ allowing an attempt at folding. (2) After 15s, ATP is hydrolyzed in the cis ring destabilizing the GRoEL/GroES complex. 
(3) ATP binds to the trans ring of 58kDa subunits causing release of GroES and either release of the folded protein or another cycle of 


binding/release if the protein is not yet fully folded. 


the crystal structure, the interior cavity is likely to be divided 
at the equatorial region of the complex between the upper 
and lower ring of subunits by flexible parts of the N- and 
C-termini of the subunits. Flexible regions of polypeptides 
are not visible in electron density maps from which such 
structures are determined (Section 6.5). The cavity of the 
ADP-GroEL complex is capable of accommodating several 
substrate protein molecules at a time and presents a nonpo- 
lar face made up of hydrophobic regions or ‘patches’ on the 
surface of each subunit. 

The 58kDa subunits consist of three distinct structural 
domains the equatorial domain is the largest of these and 
forms the bulk of the interface between the two rings of 
subunits and is also the site of ATP-ase activity. The apical 
domain provides most of the interactions with GroES (see 


below) and is connected to the equatorial domain by an 
intermediate domain which transmits allosteric structural 
changes between the other two domains. 

Exposed hydrophobic regions of unfolded protein bind 
at the hydrophobic face lining the interior cavity of GroEL, 
leading to ATP-ase catalyzed ATP hydrolysis in the distal 
(trans) ring of subunits (Figures 6.9 and 6.10). ATP can 
now bind to the cis ring of subunits which results in ma- 
jor structural changes in GroEL. The ATP-bound form of 
GroEL has an enclosed cavity volume which is twice that 
of the ADP-bound form (the diameter increases to 70 A). 
Moreover, ATP-GroEL has low affinity for unfolded pro- 
tein due to the exposure of polar side-chains on the previ- 
ously hydrophobic interior and also due to adjacent 58 kDa 
subunits moving further apart, thus disrupting binding of 
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Figure 6.10. Protein folding in the Hsp 60 chaperonin. Views from above (top) and horizontal with (bottom) the Gro EL cylinder are shown 
after unfolded protein enters the cavity. (a) Hydrophobic stretches of unfolded protein bind to hydrophobic ‘patches’ of the interior of Gro 
EL. The co-chaperone, Gro ES (Hsp 10) has low affinity for this form of Gro EL. (b) ATP hydrolysis in the trans ring (see Figure 6.9) of Gro 
EL together with Gro ES binding causes an allosteric alteration in Gro EL. 58 kDa subunits move apart and hydrophilic residues (grey bars) 
replace hydrophobic side-chains on the interior of the cavity which significantly enlarges. The unfolded protein is released into the cavity 
where it can fold. Gro ES stabilizes this form of Gro EL by binding at one end of the cylinder to the apical domains of the 58 kDa subunits. 
It can alternate between opposite ends of the cavity as shown by arrow in successive cycles of binding and dissociation of unfolded protein. 


substrate proteins which interact with several GroEL sub- 
units at a time. The unfolded protein consequently dis- 
sociates and is released into the cavity where it attempts 
to fold. 

In vivo, GroEL requires the co-chaperone GroES which 
is also a heat shock protein (Hsp 10) and is assembled into 
a heptamer. This acts as a ‘lid’ which covers one end of the 
GroEL cylinder (Figures 6.9 and 6.10). The initial binding 
of unfolded protein to the cis ring leads to dissociation of 
GroES from the trans ring. GroES can bind to the cis ring of 
the GroEL cylinder at the same time as ATP causing the cav- 
ity formed by the adjacent (cis) ring of 58 kDa subunits to 
enlarge relative to the distal (trans) ring by structural effects 
on the apical domain of the cis ring subunits (Figure 6.10). 
This forms a hydrophilic structure which is sometimes re- 
ferred to as the Anfinsen cage (Figure 6.10). ATP binding to 
the trans ring of 58 kDa subunits causes release of GroES 
which opens the cage allowing folded protein either to es- 
cape or, if incompletely folded, to undergo another cycle of 
binding and folding. On each cycle of binding and release 
of unfolded protein, GroES binds at alternate ends of the 


GroEL cylinder. When the protein has folded correctly, it is 
released from the cavity into the cytosol. The structure of 
the GroEL-GroES complex has now been solved by X-ray 
crystallography and its structure and function can be viewed 
on the Internet at the address given in the Bibliography. 

The GroEL/GroES system provides a dynamic surface 
environment of alternating hydrophobicity/hydrophilicity 
which is sequestered from the cytosol (Example 6.1). It 
increases the rate of individual steps in the protein folding 
pathway and protects the unfolded protein from proteolysis. 
The system also has an unfolding activity and, in the event 
of a protein folding incorrectly, can promote unfolding of 
the incorrect structure and give the protein another oppor- 
tunity to fold. While many proteins do not need this system 
to fold correctly, most do fold faster in its presence. Such 
fast-folding proteins will often fold even before a cycle of 
ATP-hydrolysis is complete, thus avoiding waste of energy 
by the cell. It has been suggested that the ATP-ase activity 
of GroEL may act as a ‘gatekeeper’ providing a selection 
mechanism for slow-folding proteins which may require one 
or several cycles of ATP-hydrolysis. 
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Example 6.1 Forced Unfolding of RUBISCO in the GroEL-GroES Complex 


Ribulose-1,5-bisphosphate carboxylase-oxygenase (RUBISCO) is a protein which cannot fold spontaneously into its 
native structure in the absence of chaperonins. It becomes blocked in an unfolded state capable of exchanging most 
of its hydrogen atoms with hydrogens from solvent water. This process can be followed by radioactively labeling 
the protein with tritium and following tritium-hydrogen exchange. It was observed that some twelve protons persist 
in unfolded RUBISCO which are not available for rapid exchange. The half-life for exchange of these protons is 
some 30 min. as compared to the more usual half-life of approximately 10 ms. The most likely reason for this is that 
they are amide hydrogens involved in secondary structure such as a-helix or -sheet and thus protected from rapid 
exchange. 

These hydrogen atoms provide a specific probe to follow events in the GroEL-GroES complex. It was observed 
that, when RUBISCO is mixed with GroEL alone, GroES alone or GroEL plus various nucleotides, the time-course of 
exchange is similar to that of the unfolded protein. However, when GroES and GroEL are added in the presence of either 
Mg-ATP or f,y-imidoadenosine 5’-triphosphate (AMP-PNP) rapid exchange occurred with ten of the twelve hydrogens 
as shown in (1). 


(1) (a) 
0 
(b) 30 
° Rb+EL+ES 
20 * Rb+ EL + ADP 
‘é 20 ] * Rb+EL + ATP 
a * Rb + EL + AMP-PNP 
D 
- 
x 
0 
(c) 30 
o Rb + EL + ES + ADP 
° Rb +EL + ES + ATP 
20 4 * Rb + EL + ES + AMP - PNP 


Time (min) 


Number of tritium atoms remaining as a function of time. Note the rapid exchange (arrow) in panel C. 


The most likely reason for this is that the energy of binding of the adenine nucleotide (rather than its hydrolysis) stimulates 
formation of the GroEL-GroES complex which disrupts whatever secondary structure is protecting most of the twelve 
protons from exchange. This is consistent with a model in which the substrate protein is ‘stretched’ to disrupt secondary 
structure, thus allowing further attempts at correct folding. The process is complete within 5s which is well within the 
13 s required for each unfolding cycle. 

This is an example of forced unfolding which is thought to be a general mechanism for overcoming the slowness of 
unfolding of proteins which have become blocked in the folding process. The structure of GroEL is shown in (2) together 
with a sketch of the stretching model. 


(Continues)_! 
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Forced unfolding in GroEL. The crystal structure of the GroEL part of the GroEL-GrES complex determined by X-ray crystallography is shown above 
while a sketch of the model is shown below (T = tritium; H = hydrogen). The unfolded structure binds to GroEL in the apical domains shown on the left 
with the distance between binding sites of 25 A. When GroES binds, these domains are forced further apart which stretches the substrate protein and widens 
the gap between binding sites to 33 A. Reprinted with permission from Shtilerman et al., Chaperonin Function: Folding by Forced Unfolding Science 284, 


A functional distinction between the Hsp 70 and Hsp 60 
systems has been drawn in that the former system seems 
to function largely to maintain the folding competence of 
proteins in the cell frequently operating co-translationally. In 
other words, Hsp 70 may not necessarily achieve complete 
folding of the protein into its native structure but rather 
may primarily prevent it aggregating which would halt the 
folding process. The Hsp 60 system, on the other hand, 
seems to be designed mainly to fold proteins completely into 
their native state and functions post-translationally. While 
some proteins pass sequentially from the Hsp 70 to the 
Hsp 60 system as described above, some in vitro evidence 
suggests that this may not necessarily be a unidirectional 
pathway, that is some proteins (e.g. firefly luciferase) may 
pass to the Hsp 70 system after prior exposure to GroEL. 


Luciferase also provides an example of a group of Hsp 60 
substrates which are ejected from the chaperonin in unfolded 
form. 

Membrane proteins are not substrates for chaperonins so 
this system is mainly concerned with folding of cytosolic 
proteins. GroEL substrates are typically in the mass range 
10-55 kDa with a clear cut-off at 55 kDa, corresponding to 
the volume of the interior cavity. It is estimated that 15-30% 
of E. coli proteins may be GroEL substrates. The majority of 
small substrates (<25 kDa) are released within a single cycle 
which corresponds closely to the time required for synthesis 
of an average E. coli protein while substrates in the mass 
range 25-55 kDa require multiple chaperonin cycles and are 
typically multi-domain proteins. Of the 4300 or so proteins 
in E. coli some 2600 are cytosolic with an average size of 
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35 kDa. Only 13% of these proteins exceed the GroEL mass 
cut-off (55 kDa). In eukaryotes, the average protein size is 
larger (53 kDa) and 38% of yeast’s 5800 proteins exceed the 
GroE] cut-off. Coupled with the fact that eukaryote protein 
synthesis is significantly slower than bacteria, this suggests 
that post-translational chaperonin activity may not be as im- 
portant in these organisms. A possible reason for this may be 
a preference for cotranslational domain-wise folding in eu- 
karyotes (i.e. folding of individual domains while translation 
of other domains is still in progress) compared to a prefer- 
ence for post-translational folding in prokaryotes. Indeed it 
has been suggested that, during evolution, domains which 
had a predisposition to fold cotranslationally may have been 
selected and this may explain why eukaryote proteins het- 
erologously expressed in bacteria so often aggregate to form 
inclusion bodies. 

Chaperones make use of ATP-hydrolysis coupled to al- 
losteric structural changes to promote protein folding in 
cells. But many of the physicochemical forces driving fold- 
ing are expected to be largely independent of chaperones and 
these proteins may merely provide a supportive microenvi- 
ronment where these forces can operate. Two main models 
for the contribution of chaperones to folding of slow-folding 
proteins have been proposed. The Anfinsen cage model sug- 
gests that intermolecular aggregation could provide a limit 
to folding and that the main role of chaperones is to avoid 
this. An alternative view is the iterative annealing model 
which proposes that the rate-limiting step in slow folding 
may be the intramolecular formation of incorrectly folded 
structures and that the role of chaperones is to unfold such 
polypeptides and give them an opportunity to fold correctly. 
These two models are not mutually exclusive and it is pos- 
sible that chaperones contribute to the folding of individual 
proteins in different ways depending on the particular rate- 
limiting step in their folding pathways. 


6.2 STRUCTURE DETERMINATION 
BY NMR 


In our previous discussion of NMR spectroscopy (Chapter 
3; Section 3.7) we saw that the chemical shift associated 
with a nucleus possessing magnetic spin may be altered 
by its precise chemical environment. Interactions between 
magnetic dipoles in different nuclei can arise as a result of 
either the bond structure of the group in which the nucleus 
is involved or the presence of nearby nuclei. These inter- 
actions provide a means by which atoms in a protein can 
be connected to one another and multi-dimensional NMR 
can be used to detect these interactions. NMR is therefore a 
major method for structure determination of biomolecules 
in solution and for his work in this area Kurt Wtitrich was 


Figure 6.11. Molecular magnetization. The two spin states possi- 
ble for a nucleus of spin J = 1/2 are shown. The lower-energy œ 
state is slightly more heavily populated than the higher-energy 8 
state. 


awarded the Nobel Prize in Chemistry in 2002. This section 
will outline the physical basis, advantages and limitations 
of this important group of techniques. 


6.2.1 Relaxation in One-Dimensional NMR 


One-dimensional NMR depends on the fact that the energy 
of nuclei which have magnetic spin (i.e. J 40) becomes 
orientation-dependent in an applied external magnetic field. 
In biological NMR the most useful nuclei are 1H, BC, DN, 
31P and |°F (Table 3.6). Because the nucleus is subatomic, 
the number of energy levels possible is constrained by quan- 
tum mechanics to 2/ + 1. For nuclei of J = 1/2, there are 
two possible energy levels which may be regarded as pre- 
cessing around the applied external magnetic field, Bo, as 
shown in Figure 6.11. As previously described (Figure 3.49), 
there is only a slight preference for the low-energy spin 
state over the high-energy one in the absence of the external 
magnetic field. Figure 6.11 is a representation of molecular 
magnetization. 

Since NMR spectra are actually determined at the macro- 
scopic level, we could alternatively represent this process as 
macroscopic magnetization (Figure 6.12). This represents 
magnetization, M as a vector quantity which is subjected to 
a torque as a result of the applied magnetic field: 


Torque = M Bo (6.8) 


This may be regarded as exposing an equilibrium mag- 
netization vector, M-, parallel to Bo, to a longitudinal com- 
ponent so that the magnetization now precesses about the 
axis of the external magnetic field. When the sample is ex- 
posed to radiofrequency radiation (which has a small mag- 
netic field associated with it), the resultant magnetic field 
Biota also exposes M, to a transverse component which is 
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Figure 6.12. Macroscopic magnetization. Magnetization is represented as an equilibrium vector quantity (M,) parallel to the direction of 
applied magnetic field (Bo). Exposure to radiowave radiation (RF) under resonance conditions causes perturbation of this vector to a value 
M. This perturbation has two components; (a) longitudinal and (b) transverse. Relaxation from this perturbed state back to the equilibrium 


state is shown. 


perpendicular to the z-axis and which precesses around it at 
the Larmor frequency (this term was previously introduced 
in Chapter 3, Section 3.7.1). 

After exposure to a pulse of radiofrequency radiation, the 
longitudinal magnetization must return to the value of the 
equilibrium magnetization vector, M,, while the transverse 
magnetization must decay to zero. The processes by which 
this happens are called, respectively, longitudinal and trans- 
verse relaxation and are quite slow relative to the time-scale 
of other spectroscopic methods (Figure 6.12). M is a bulk 
property of a magnetized sample which is evenly distributed 
throughout the solution but which nonetheless has direction. 
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Many individual resonances contribute to M and, after ex- 
posure to radiowave radiation, each will precess at its own 
characteristic Larmor frequency. 

This induces a tiny current in the receiver coil of the NMR 
spectrometer. This current has characteristics described by 
the free-induction decay (FID), namely a wavelength and a 
decay time which may be represented as a plot of intensity 
versus time. This plot is converted through the Fourier trans- 
form (Appendix 2) into a plot of intensity versus frequency; 
the NMR spectrum (Figure 6.13). A molecular species con- 
taining several nuclei (J 4 0) will produce an FID which 
is the sum of individual FIDs, each with a characteristic 


Spectrum 


Figure 6.13. Free induction decay (FID). When sample is magnetized, a component of M remains in the xy plane where it generates 
radio signals. These produce the free induction decay (FID) which decays exponentially with time. This function is directly related to the 


frequency domain (w) by the Fourier transform. 
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wavelength and decay time arising from each nucleus. 
The one-dimensional NMR spectrum of small molecules is 
structurally informative since resonances resulting from the 
experiment are well-resolved and arise from individual pro- 
tons. However, large biomolecules such as proteins and nu- 
cleic acids yield complex spectra with extensive overlap 
between individual peaks which makes structural interpre- 
tation of these samples from one-dimensional NMR spectra 
alone impossible. 

Two-dimensional NMR (2-D NMR) overcomes this dif- 
ficulty by extending measurements of these effects into a 
second dimension and, as will be described later, three- and 
four-dimensional NMR are also possible. These measure- 
ments can be used to detect interactions between magnetized 
protons which are near together in space and are of two gen- 
eral types; through-space or through-bond. Through-space 
interactions are called nuclear overhauser effects (NOEs) 
while through-bond interactions result in correlation spec- 
troscopy. The two most important types of 2-D NMR 
spectroscopy are therefore referred to as NOESY (nu- 
clear overhauser effect spectroscopy) and COSY (correlation 
spectroscopy). Both of these effects are due to interactions 
between magnetic dipoles in a pair of nuclei which are spa- 
tially near to each other and both types of experiment are 
used in the determination of structure in large biomolecules 
by NMR. 


6.2.2 The Nuclear Overhauser Effect (NOE) 


Using the molecular magnetization representation (Figure 
6.11), we can draw an energy level diagram for two protons, 
i and j, near together in space (Figure 6.14). Four spin states 
are possible which may be represented as; wa, af, Ba and 
BB. Transitions between these states may occur which can 
be followed by relaxation back to the ground state. Magne- 
tization can be exchanged between these protons by a form 
of relaxation called cross-relaxation but, at equilibrium, the 
rate of this process in both directions is the same. The cross- 
relaxation rate, o, between i and j is directly dependent on 
two parameters: 


constant Teff 
o = —_— (6.9) 


r6 


where r is the distance between the protons and Tepp is the ef- 
fective rotational correlation time of the interaction between 
the protons. When the magnetization of i is perturbed, how- 
ever, the magnetization of j will change during a time-period 
immediately after the perturbation. This change in magne- 
tization of j on perturbation of i is the NOE and the initial 
rate of build-up of this is equal to the cross-relaxation rate, 
o, and is hence proportional to 1/r°®. 
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Figure 6.14. The nuclear overhauser effect (NOE). Two protons 
(i and j) may each exist in a low-energy state (œ) or a high-energy 
state (8) in a magnetic field (Bo) as previously described (Figure 
6.11). Transitions between these states at equilibrium (arrows) occur 
by cross-relaxation with rate constants which are equal in opposite 
directions (for clarity, only kj and kK_; between 66 and Ba are 
shown). However, when this situation is perturbed by exposure to 
radiofrequency radiation, the equilibrium is disturbed (kj 4 k_1). 
A change in magnetization of i changes the magnetization of j for 
a time period after the perturbation. This is the nuclear overhauser 
effect (NOE) and the rate of build-up of this is equal to the cross- 
relaxation rate. 


This dependence of NOE on the distance between pro- 
tons, r, is the reason why they are structurally informative 
since measurement of an NOE gives direct indication of 
the magnitude of r. However, the presence of internal mo- 
tion in a protein will affect ø in Equation (6.9). Therefore 
Teff Can vary in a manner dependent on the location of a 
proton in the protein and this may place a limitation on 
the direct measurement of r values from NOEs. Moreover, 
NOEs are determined with a finite experimental accuracy 
and may be subject to quenching effects due, for example, 
to the presence of paramagnetic metal ions as contaminants 
in the sample. For these reasons, it is more difficult to obtain 
reliable lower limit distance estimates from NOEs than up- 
per limit distance estimates. In practice, the van der Waals 
radius is taken as the lower limit for estimates of r while 
upper limits can be calculated safely from NOEs. Because 
of the strong dependence of NOEs on r, they only arise 
between protons located within 5 A of each other and are 
classified into strong (1.8-2.7 A), medium (1.8-3.3 A) and 
weak (1.8-5 A) effects. 

Clearly, the variation of r between two protons in the 
range between the van der Waals radius (approximately 2 A) 
and the upper limit of an NOE (5 A) would allow a relatively 
large amount of conformational space for these protons to 
arrange themselves relative to each other. These values rep- 
resent a distance constraint. However, both i and j are also 
involved in NOEs with other protons (e.g. h, k, l, etc.) and 
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Figure 6.15. Distance constraints determined by NOEs. A NOE 
(double arrow) between protons i and j would be equally consis- 
tent with j occupying any location within the large circle (dashed) 
around proton i. However, identification of a second NOE between 
protons j and k suggests that j is likely to occupy the hatched area. 
Note that there is no NOE between protons i and k since they are 
more than 5 A apart. In reality, many NOEs between many protons 
are used to produce a list of distance constraints which operate in 
three dimensions rather than the simplified two-dimensional repre- 
sentation shown here. 


these narrow down the amount of space which would be 
consistent with NOEs between individual pairs of protons 
(Figure 6.15). NOESY spectra produce a list of interpro- 
ton distance constraints for many individual protons in the 
protein. Since there are only a small number of structural 
arrangements which would satisfy this list of distance con- 
straints, an accurate indication of three-dimensional struc- 
ture is possible from NOESY spectra (see below). 


6.2.3 Correlation Spectroscopy (COSY) 


In Section 3.7.4 we saw that peak-splitting in one- 
dimensional NMR can result from a process variously called 
spin-spin coupling, J-coupling or scalar coupling. This is 
due to spin-pairing between pairs of nuclei (e.g. i and j) and 
electrons located in the bonds between them and are called 
correlation effects. Unlike the NOE, these operate through 
covalent bonds. The energy of this interaction, E, is given 
by 

E =hJ hlj (6.10) 
where h is Planck’s constant and J is the coupling constant 
(see Section 3.7.4). Karplus calculated from theory that the 


three-bond coupling constant ? J between two protons i and 
j in ((H-C-C-H’) has a form dependent on the dihedral 
angle 8 as follows: 


37 = A+ Bcos0 + C cos? 0 (6.11) 


where A, B and C are coefficients dependent on substituent 
electronegativity. This Karplus equation can be used to es- 
timate the @ angle in polypeptides (see below) using mea- 
surements of coupling constants derived from 2-D-NMR. 

All 2-D-NMR experiments follow the general scheme 
outlined in Figure 6.16. The result of this sequence is a 
series of FIDs which are detected and form the data input 
subsequently analysed to produce a structure. The effects 
and significance of the various time-periods into which the 
experiment is divided on the magnetization vector are de- 
scribed in more detail in Figure 6.17. 

After a preparation period, sample is exposed to a ra- 
diofrequency pulse of radiation which magnetizes each pro- 
ton in the sample. This has the effect of forcing M from 
the z-axis into the x-y plane as previously described in Fig- 
ure 6.12. This is called the evolution period and various 
times (tı) are allowed for this varying from 0 to a maxi- 
mum value (ti max) which may be several seconds. Sample 
is then exposed to a second radiofrequency pulse which 
rotates the plane of the magnetization vectors into the x-z 
plane. This is called the mixing period (Tm). The magni- 
tude of the vector rotated in this step will vary depending 
on the duration of tı. During Tm, z-components of the mag- 
netization vectors are selected which vary depending on 
the length of the evolution period (i.e. value of tı). These 
z-components are now said to be frequency-labeled since 
their precession frequency will vary in a manner which de- 
pends directly on t,. A third radiofrequency pulse now al- 
lows us to ‘read’ these frequencies by inducing transverse 
relaxation which results in an FID which is detected and 
analysed. 

This experiment results in a data matrix, s(t), t2). 2- 
Dimensional Fourier transformation of this matrix yields 
a 2-D frequency spectrum S(@,, œ) where w corresponds 
to chemical shifts (i.e. PPM) in one-dimensional NMR. A 
key point to understand is that the diagonal of this spectrum 
corresponds to the one-dimensional spectrum for the sample 
where w; = w2. 

COSY spectroscopy allows identification of through- 
bond coupling effects in an experiment distinct from 
NOESY as off-diagonal peaks (sometimes referred to as 
cross-peaks; w # œ). Like NOEs, spin-spin coupling be- 
comes weaker with distance and correlation effects are only 
observed when two nuclei are linked by up to a maximum 
of three bonds. Correlation effects are in principle possi- 
ble between any pair of nuclei possessing spin but, for the 
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Figure 6.16. Diagrammatic overview of COSY 2-D NMR experiment. After a preparation period, sample is exposed to three pulses of 
radiofrequency (RF) radiation with various time delays. The first pulse is followed by an evolution period (tı) which varies from 0 to ti max- 
The second pulse is followed by a mixing period (Tm), while a detection period (t2) follows the third pulse. FIDs are collected for each value 
of tı and produce a two-dimensional time domain data matrix, s(t, t2). This is Fourier transformed first with respect to t2. This produces 
a one-dimensional spectrum in which the peak amplitude varies regularly as a cosine function of t}. A second Fourier transform is then 
performed with respect to t; to produce a two-dimensional spectrum (w1, w2). The diagonal of this plot is identical to a one-dimensional 
NMR spectrum. For simplicity, the spectrum from a single NMR resonance is presented. If multiple resonances were present, correlation 


effects would be detected as spots off the diagonal (w; 4 w2). 


purposes of protein structure determination in small pro- 
teins (M, < 15kDa), the most widely used COSY exper- 
iment is that involving homonuclear interactions between 
protons (i.e. 'H-'H). For larger proteins (M, = 15-30 kDa) 
heteronuclear interactions are determined while other meth- 
ods are required for proteins of M, > 30 kDa. 


Amino acid residues, by definition have unique covalent 
bonding patterns, they represent distinct spin systems which 
can be identified by COSY spectroscopy (Figure 6.18). 
Moreover, the dependence of the coupling constant, J, on 
the dihedral angles between groups linked by a covalent 
bond allows determination of @ using the Karplus equation 
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Figure 6.17. Changes in magnetization vector during NOESY. The first radiofrequency (RF) pulse forces the magnetization vector, M, into 
the x-y plane (dashed line). The vector precesses around the z-axis and is sampled at various times (tı) during the evolution period. After 
tı, sample is exposed to a second RF pulse and M is again forced through 90° into the x-z plane. A time-period called the mixing period 
(Tm) is now allowed at the end of which longitudinal components (i.e. those along the z-axis) of M are selected by a third RF pulse and 
again rotated through 90°. Cross-relaxation happens during Tm as a result of NOEs. These are evident in the two-dimensional spectrum as 
wı 4 œ (Figure 6.20). Note that shorter ¢; results in an outcome from the series of pulses which differs from that from longer tı values as 


illustrated by the faint arrow (in lower panel). 


(Equation (6.11)). Dihedral angles are limited to only cer- 
tain ranges in proteins and are crucial for defining the con- 
formation of the overall polypeptide chain (Figure 6.19). 
COSY spectroscopy can yield a list of torsion angle re- 
straints for the protein. Together with the list of distance 
constraints generated by NOESY, these form the major data 
inputs in determination of protein three-dimensional struc- 
ture by multidimensional NMR. 


Cysteine Threonine 

Figure 6.18. Amino acid residues are distinct spin systems. 
J-coupling between protons is shown for cysteine and threonine 
residues. These couplings operate through a maximum of three co- 
valent bonds and form the basis of off-diagonal peaks in COSY 
spectroscopy. In addition to !H-!H homonuclear coupling, het- 
eronuclear coupling is also possible. Each amino acid is a dis- 
tinct spin system which gives a characteristic ‘fingerprint’ for each 
residue type. 


6.2.4 Nuclear Overhauser Effect Spectroscopy 
(NOESY) 


NOESY spectroscopy uses generally the same sequence of 
RF pulses and time-periods as shown for COSY in Fig- 
ure 6.16. The diagonal of a NOESY spectrum consists of 
resonances from protons which have not undergone cross- 
relaxation during Tm (i.e. œ; = @2) while off-diagonal 
peaks (w; # w2) represent through-space NOE exchanges of 
magnetization between pairs of protons arising from cross- 
relaxation during the mixing period (Tm). 


Figure 6.19. Torsion angles in polypeptides. Rotation is possible 
around three types of bonds in polypeptides since the peptide bond 
is essentially planar (dashed plane). ø is the C-N bond, ¢ is the 
C-(C=O) bond while x is rotation around side-chain bonds. Not 
all values are sterically allowed for these torsion angles. Their values 
are related to coupling constants which may be determined from 
COSY spectra by simple geometric relationships. Note that some 
residues may have more than one x bond (e.g. valine). 
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Figure 6.20. NOESY spectrum of lysozyme. A typical spectrum 
of 2mM hen lysozyme in 90% H20/10% D20 at 298 K, pH 6.0. 
Note that the diagonal (w; = w2) represents many resonances while 
NOEs between protons appear off the diagonal as (w1 4 w2). (Cour- 
tesy of Dr L.-Y. Lian, Biological NMR Centre, Leicester, UK). 


In a real NOESY experiment thousands of NOEs may 
be visible in such a spectrum but, because of the two- 
dimensional presentation of the data, it is possible to resolve 
individual resonances in a 2-D NMR experiment which 
would be undetectable in a one-dimensional spectrum due 
to spectral overlap (Figure 6.20). 


6.2.5 Sequential Assignment and Structure 
Elucidation 


The overall strategy for determining structure from 2-D 
NMR for proteins up to approximately 100 residues was 
developed by Kurt Wiitrich in the early 1980s. More sophis- 
ticated NMR experiments suitable for larger proteins based 
on this overall approach were developed later and these are 
briefly described below. We can summarize the approach in 
four main steps (Figure 6.21): 


1. Resonances in 2-D NMR spectra are assigned to indi- 
vidual amino acids in the sequence in a process called 
sequential resonance assignment. This involves a com- 
bination of COSY and NOESY experiments allowing 
the identification on the diagonal of such spectrum (i.e. 
@, = @ ) of resonances associated with specific indi- 
vidual amino acids in the sequence. NOE interactions 
between adjacent residues in the sequence (e.g. -NH 
(i + 1) with œH (a) are identified in this process and, for 
example, allow unambiguous identification of a specific 
residue as being adjacent to two others, one on either side 
of it, in the primary structure. 


2. Bond-angle restraints are identified from COSY mea- 
surements and intraresidue and sequential inter-residue 
NOE data allowing determination of conformation limits 
on the polypeptide chain. 

3. Through-space inter-residue NOEs between individual 
residues which are far apart in the sequence are identified. 

4. Based on data inputs from steps 1-3 a three-dimensional 
structure consistent with the 2-D NMR data is calculated 
using a variety of computer algorithms. This structure is 
then refined to produce a high-resolution structure. 


To make this clearer, let us consider a protein of known 
primary structure (Figure 6.22). There are three glycine 
residues in this sequence (at positions 10, 16 and 35). 
Glycine 16 is in a particular sequence in which it is flanked 
by leucine and tyrosine residues. COSY experiments al- 
low us to identify positions corresponding to the glycine 
‘fingerprint’ on the 2-D NMR diagonal (w; = w2) but at this 
stage we do not know which position corresponds to residue 
10, 16 or 35. However, because of their proximity within the 
primary structure, we expect the NOESY spectrum to reveal 
off-diagonal resonances corresponding to an interaction be- 
tween glycine-16 and leucine-15 and between glycine-16 
and tyrosine-17. This allows us to identify the diagonal po- 
sition of resonances due to glycine-16 in an unambiguous 
way which distinguishes it from the other two glycines in 
the sequence. We also now know the diagonal positions of 
leucine-15 and tyrosine-17. In principle, we can proceed 
to identify NOEs between tyrosine-17 and the residue at 
position 18 and between leucine-15 and the residue at posi- 
tion 14 by simply repeating this process in an iterative way 
along the sequence. At the end of this process we should 
know the diagonal positions of each residue in the sequence 
(see Example 6.2). 

Bond angles along the polypeptide chain are now de- 
termined by a combination of COSY and NOESY. These 
define value-ranges for ¢, g and x angles shown in Figure 
6.20. This is followed by inspecting the NOESY spectrum 
for through-space interactions between resonances which 
are far apart in the sequence but within 5 A of each other 
in conformational space. This generates a list of distance 
constraints which may be analysed by computer algorithms. 
The most usual approach is the use of distance geometry 
which defines a family of three-dimensional conformations 
consistent with the observed NOEs based on the type of 
analysis previously shown in Figure 6.15. Since there are 
only a limited number of ways in which a polypeptide of 
known sequence can fold up to produce a particular pattern 
of NOEs while at the same time staying within the torsion 
angle ranges defined by COSY and without violating the 
well-established geometrical principles governing protein 
structure, 2-D NMR usually results in a unique family of 
conformations. 
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Figure 6.21. Overall approach to structure determination by 2-D NMR. Step 1. Based on a combination of NOESY and COSY experiments, 
resonances are assigned to individual residues in the sequence (e.g. cysteine-threonine). Each will have a distinctive ‘fingerprint’ in COSY 
(correlation effects) as well as characteristic intramolecular NOEs as shown. Moreover, intermolecular NOEs between adjacent residues 
are also identifiable by NOESY. Step 2 COSY spectroscopy identifies value ranges for torsion angles since these are directly related to the 
coupling constant, J. This produces a list of torsion angle restraints for each residue in the sequence. Step 3. Through-space intermolecular 
NOEs identify interactions between residues which may be far apart in the primary structure (dashed line). Two sequentially distant parts of 
the polypeptide brought together by a hydrogen bond are shown. This produces a list of distance restraints. Step 4. Using distance geometry 
or other methods, the list of distance and angle restraints are used to calculate a structure for the polypeptide. This is refined to produce a 
high-resolution structure. 
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Figure 6.22. Diagrammatic representation of sequential resonance assignment in 2-D NMR. A protein containing the sequence Leu-Gly-Tyr 
is under investigation. This protein contains three glycines (at positions 10, 16 and 35). (a) Resonances on the diagonal correspond to individual 
amino acid residues. Each type of amino acid gives a distinct COSY spectrum. NOEs between adjacent residues may be identified in the 
NOESY spectrum and this identifies the diagonal position of Glycine-16. This process is repeated for every residue in the sequence. (b) 
Intermolecular NOEs between sequentially distant parts of the protein are then identified from NOESY spectra since diagonal positions of 
each residue are now known. 
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Example 6.2 Structure of a Protein Determined by Multi-dimensional NUR 


A3-3-Ketosteroid isomerase (KSI) is a dimeric enzyme which catalyzes highly efficient isomerization of various 
ß.y -unsaturated oxo-steroids. Despite the fact that crystals of this protein were obtained in 1976 and a low-resolution 
structure determined to 6 A resolution was published in 1984, determination of a high-resolution structure had proved 
elusive. For this reason heteronuclear, multi-dimensional NMR was carried out to determine the structure of this protein. 

ISN- and !°NC-labelled KSI samples were obtained from bacteria grown in media rich in !îN and "C. Ten percent 
dioxane was included in solvent to inhibit aggregation at concentrations greater than 80 uM. Four-dimensional '°N, 
13C- and '3C, °C NOESY spectra were generated and used to assign backbone and side-chain signals. A list of 1865 
distance constraints ((1) below) was used as an input to generate 20 individual distance geometry structures which are 
shown in (2). 

Disorder at the dimer interface is responsible for the greater convergence of the family of monomeric structures 
compared to that for the dimer. The model agreed well with previously published structure—function work on this enzyme 
and allowed identification of important active-site residues. 


(1) 


Distance restraints generated by multi-dimensional, heteronuclear NMR 


Intraresidue 49 
Sequential 306 
Medium range (|i — j| = 2-5 residues) 351 
Long range (|i — j| > 5 residues) 545 
Intermolecular 60 
Hydrogen bond restraints 554 
Total NMR-derived restraints 1865 
Mean restraints/residue 15.2 
(2) (a) 


(b) 


Superposition of the family of 20 structures generated for the dimer (A) and the monomer (B) of KSI. Reprinted with permission from Wu et al., Solution 
Structure of 3-oxo-A>-Steroid Isomerase, Science 276, 325-327 (1997) American Association for the Advancement of Science. 
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At this stage, the overall fold of the polypeptide chain 
and secondary structure features such as alpha-helices and 
beta-sheet are clearly defined. The final step in the proce- 
dure is refinement of the structure to a resolution of a few 
A. This will often involve the use of statistical information 
obtained from high-resolution databases of structures de- 
rived from crystallographic measurements as well as more 
sophisticated analysis of NMR data. More information on 
protein structure refinement is given below in Section 6.5.8. 


6.2.6 Multi-Dimensional NMR 


The approach to structure determination which has just been 
described has been successfully used to determine three- 
dimensional structures to a resolution comparable to 2.5 A 
crystal structures for proteins up to M, of 15 kDa. However, 
because of problems in assigning resonances unambigu- 
ously, spectral overlap and the large number of resonances 
obtained in 2-D NMR, the technique is not generally ap- 
propriate for larger proteins (>15kDa). Moreover, some 
proteins smaller than 15 kDa may produce spectra of such 
complexity that their structures cannot be determined us- 
ing 2-D homonuclear proton NMR alone. For such difficult 
samples we may require NMR spectroscopy in a third or 
fourth dimension, that is 3-D and 4-D NMR (Example 6.2). 

These experiments may be simply regarded as linear com- 
binations of 2-D NMR sequences of the type previously de- 
scribed (Figure 6.16) with the proviso that there is a single 
preparation period at the beginning and a single detection 
period at the end of the sequence. Thus, 3-D NMR consists 
of two successive 2-D NMR sequences while 4-D NMR 
consists of three (Figure 6.23). The evolution (t2) and mix- 
ing (Tm) periods of each 2-D NMR sequence are varied 
independently of (t2, Tm) values of other sequences. 

3-D and 4-D NMR experiments make use of heteronu- 
clear correlation effects along the polypeptide backbone 
(Example 6.3). In these experiments, the sample protein is 
isotopically labeled uniformly with '°N and °C. These iso- 
topes are introduced into the protein by overexpression in 
bacteria grown on NH4Cl and 'C-glucose as their ex- 
clusive sources of nitrogen and carbon. For larger proteins, 
specific isotopic labeling may be used in which isotopes are 
selectively introduced at known sites. These isotopes allow 
2-D proton homonuclear NMR to be followed by heteronu- 
clear NMR in such a way that 3-D NMR generates a (a), 
@ , wz) data-set while 4-D NMR generates (w1, @2, @3, W4) 
(Figure 6.24). These experiments facilitate resonance as- 
signments to much higher resolution than 2-D NMR alone 
thus allowing structure determination of larger proteins or 
of smaller proteins for which 2-D NMR is insufficient. This 
approach has been successfully used to determine protein 
structure up to 30 kDa. 
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Figure 6.23. Multi-dimensional NMR. Pulse-sequences responsi- 
ble for (a) 2-D NMR, (b) 3-D NMR and (c) 4-D NMR are built 
up by successive 2-D NMR pulses. Successive evolution (t1) and 
mixing (Tm) time-period ranges are denoted by superscripts a, b 
and c. Note that time-period ranges in successive 2-D NMR pulses 
are varied independently of each other.. 


6.2.7 Other Applications of Multi-Dimensional NUR 


Our discussion of multi-dimensional NMR has thus far fo- 
cused almost exclusively on the determination of three- 
dimensional structures of proteins (Example 6.2). How- 
ever, this group of techniques is also useful as a gen- 
eral probe for protein structure which can also shed light 
on aspects of protein dynamics such as enzyme catalysis, 
protein-protein complex formation, ligand binding and pro- 
tein folding (Example 6.3). In principle, any biomolecule 
containing atoms possessing spin can give an NMR spec- 
trum and resonances due to individual atoms can be followed 
by multi-dimensional methods. Although, high-resolution 
structural information may not always be obtainable from 
such spectra, they can nonetheless yield important clues 
to shape or function. The technique is therefore generally 
applicable to other classes of biomolecules such as DNA 
and carbohydrates and a similar approach can be taken to 
analysis of structure and function in these molecules as for 
proteins (Table 6.2). 

In the case of DNA, multidimensional NMR studies have 
tended to concentrate on short oligonucleotides (10-20 bp) 
rather than large structures. This can yield structural in- 
formation on functionally important parts of DNA such as 
promoters and operators. Since there are only four bases 
commonly found in DNA (compared to twenty amino acid 
residues in proteins), proton NMR spin systems are more 
easily assigned to bases than to amino acids. The major 
interproton distances measured from NOESY in DNA sam- 
ples are between sequentially adjacent bases rather than the 
more long-distance interactions which we have come across 
in protein structure. NOEs are also measurable between pro- 
tons on separate, individual strands of double-stranded DNA 
which happen to be brought close together by hydrogen 
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Example 6.3 Using Two-dimensional NMR to Follow in vitro Evolution of a Protein Fold 


Comparison of related protein sequences suggest that mutation of one amino acid residue for another is a major factor 
in the evolution of novel proteins. When such changes are deliberately made to proteins by site-directed mutagenesis, 
they often result in altered biological activity and stability. However, mutation rarely results in change in the overall 
fold of the protein. The most likely reason for this is delocalization of information specifying a particular fold over the 
whole protein rather than merely being specified by the sequence alone (Section 6.1.1). A mutagenesis study of the Arc 
repressor provides an interesting exception to this rule. 

Residue-by-residue differences in chemical shift show most changes concentrated between residues 9 and 14 

Portion of NOESY spectrum showing cross-peaks between Leu-21 and other residues (numbered) in mutant Arc. 
NOEs shown as solid lines are not consistent with wild-type structure while those shown with dashed lines are. 
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Representation of the structure of Arc from two perspectives (left and right). Structure of residues 8-46 of the wild 
type is shown in panels A and B while that of residues 15 to 46 of the mutant is shown in C and D; 


$h, 


C D 


Sir 


(3) 
A 


(Continues)_! 


THREE-DIMENSIONAL STRUCTURE DETERMINATION OF MACROMOLECULES 223 


M (Continued) 


Arc is a homodimer each subunit of which contains a crucial sequence (-GIn?—Phe—Asn—Leu—Arg-Trp!*_). This sequence 
contributes to an antiparallel 6-sheet in the dimer. A feature of this sequence is that even-numbered side-chains are 
hydrophobic and buried in the core of the protein where they contribute to folding and stability while odd-numbered 
side-chains are polar and located on the surface of the protein where they can bind to operator DNA. 

Mutations were introduced into this protein which resulted in an interchange of residue 11 for residue 12 (i.e. resulting 
in the sequence; —GIn?—Phe-Leu-Asn—Arg-Trp'*_-). Spectroscopic investigations such as fluorescence and circular 
dichroism (Chapter 3) showed that this mutation had disrupted the structure of the protein while only slightly affecting 
its thermal stability. Heteronuclear two-dimensional NMR was used to study NOEs between 'H and !°N and are shown 
in (1) and (2). 

Analysis of the chemical shifts for each residue revealed that the bulk of the differences between the NMR spectra of 
the mutant and wild-type proteins is concentrated between residues 9 and 14. This suggests a changed structure in this 
region while the rest of the protein is largely unchanged. 

A family of three-dimensional structures of the mutant Arc determined from 99 NOEs was determined and the averaged 
structure is shown in (3). This reveals a localized conformational change in the region of residues 9-14 from f-sheet 
(wild type) to helix (mutant). This was rationalized in terms of the amphiphilicity of the mutant sequence PHHPPH 
(where P is polar and H is hydrophobic) which is consistent with a helical periodicity (3—4 residues per turn) rather than 
the PHPHPH periodicity of the wild type (consistent with a 6-strand in which H is on one side and P is on the other). 

Images reprinted with permission from Cordes et al., (1999) Evolution of a Protein Fold in Vito, Science 284, 325-327, 
American Association for the Advancement of Science. 
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Figure 6.24. Multidimensional NMR. In a homonuclear 2-D NMR experiment a number of resonances are observed which have the same 
value of w2. 3-D heteronuclear NMR allows further resolution of these resonances depending on the identity of the heavier atom (nitrogen 
or carbon) to which one of the protons is attached. This data-set consists of (w1, w2, w3) resonances due, respectively to (H, ISBN or BC, 
1H) 4-D heteronuclear NMR allows unique resolution of all resonances producing a data-set of values (w1, w2, w3, w4) corresponding 
respectively to (3C, 'H, °C, 'H). For clarity, only two sets of values (a and b) are shown for 3. Resonances due to each proton of the 
homonuclear proton pair are uniquely defined in terms of the chemical shifts of the heavier atoms to which they are attached. 
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Table 6.2. Selected applications of NMR to the study of 
biomolecules. 


Type of NMR Sample 


1-D proton NMR Small molecules (e.g. amino 
acids) 


Homonuclear proton Proteins (<15 kDa) 


2-D NMR 

Heteronuclear Proteins up to 30 kDa 
3-D and 4-D NMR 

Heteronuclear DNA up to 20 bp 


3-D and 4-D NMR 
Solid-state multidimensional 
NMR 


Membrane proteins (up to 
30 kDa) 


Solid-state °!p 
2-D magic angle NMR DNA fibres (to distinguish 


conformation) 


bonding. Important contributions of NMR to structural stud- 
ies of DNA include the identification of novel secondary 
structures such as the triple-helix, novel higher-order struc- 
ture such as hairpins and monitoring of drug—DNA and 
protein-DNA interactions. 

NMR has also been used in structural investigations 
of RNA which can adopt a much wider range of three- 
dimensional structures than DNA and for which very few 
crystal structures have been determined. This is important 
because knowledge of structural alterations in solution is 
essential to understanding interactions between RNA and 
proteins or DNA. The advent of solid phase oligonucleotide 
synthesis has allowed the production of large quantities of 
oligonucleotides capable of adopting a wide range of sec- 
ondary structures suitable for NMR investigation. 

Biological membrane components are not amenable to 
investigation by crystallography but are of major biochem- 
ical interest. In particular, heteronuclear NMR has been 
used to investigate the structure of phospholipid bilayers 
and membrane-bound proteins as well as interactions be- 
tween them. Recent improvements in solid-state NMR are 
of particular importance to these investigations. This tech- 
nique is applicable to phospholipid bilayers, DNA fibres, 
crystals, amyloid fibrils and membrane-bound samples such 
as receptors. It depends on the fact that these samples are 
anisotropic, that is they have a definite orientation and 
are not tumbling freely in solution in a random manner 
(Figure 6.25). 

Solution-state NMR depends on a resonance interaction 
in which angle-dependent mathematical terms are normally 
averaged to zero due to random tumbling. Samples investi- 
gated by solution-state NMR are said to be isotropic which 


means that the angle between the magnetic field, Bo and the 
sample rotation axis does not affect the spectrum obtained. 
When anisotropic samples are analysed by solution-state 
NMR, however, they produce very noisy spectra with broad 
peaks which are difficult to interpret. If the sample is in- 
stead analysed in a solid-state, an angle can be selected 
which removes the noise from the spectrum yielding a high- 
resolution result. This angle is sometimes called the magic 
angle and it has a value of 54.7°. This means that solid-state 
heteronuclear multidimensional NMR of anisotropic sam- 
ples can be used to determine high-resolution structures of 
samples which are impossible to analyse structurally either 
by crystallography or by solution-state NMR. This approach 
has determined structures of membrane-bound proteins up to 
M; 30 kDa and allowed structural and functional analysis of 
heterogenous noncrystalline samples including membrane- 
bound peptides and proteins. 


6.2.8 Limitations and Advantages 
of Multi-Dimensional NMR 


Table 6.2 gives a list of selected applications of NMR to 
the study of biomolecules. From this it will be clear that 
this group of techniques facilitates analysis of a wide range 
of size (up to 30kDa), type (building-block molecules 
to proteins and macromolecular complexes) and state 
(solution-state/solid-state; | homogeneous/heterogeneous; 
soluble/membrane-bound). It is likely that new methods of 
data analysis and resonance assignment will increase the 
protein mass-range amenable to study by NMR into the 
region 30-50 kDa in the foreseeable future. This will bring 
the subunits of many oligomeric proteins of interest within 
reach of NMR. 

In addition to the versatility of the technique, it has the 
distinct advantage that it is gentle and nondestructive facil- 
itating analysis under conditions which are close to phys- 
iological. It is also possible to include ligands and other 
biopolymers such as DNA or protein with the sample to 
follow effects on the sample molecule and to incorporate 
radioisotopes into the molecule to study individual atoms. 
Analysis can also be carried out along a time-course to fol- 
low dynamic processes such as protein folding. The main 
advantages relative to crystallography are that molecules for 
which it is impossible to grow crystals (e.g. membrane pro- 
teins) can be analysed by NMR and that the measurement 
takes place under conditions which are significantly closer 
to physiological. Moreover, flexible regions of proteins are 
not apparent in electron density maps determined by crys- 
tallography but are detectable by NMR. Direct comparisons 
between NMR and crystallography suggest that the resolu- 
tion of NMR is close to that of crystallography in the region 
of 2.5A. 


THREE-DIMENSIONAL STRUCTURE DETERMINATION OF MACROMOLECULES 225 


(a) (b) 


Solution-state 


NMR 


Randomly tumbling 
sample 


The 
“magic 
angle’ 


Solid-state 
NMR 


Rotation 


‘ iaat x 


54.7° 


Orientated 
sample 


Figure 6.25. Solid-state NMR. Freely tumbling samples (a) are isotropic while orientated samples (b) are anisotropic. Solid-state NMR 
involves analysis of solid-state, orientated samples at the magic angle of 54.7°. At this angle, high-resolution spectra are obtainable with 


solid-state samples. 


NMR measurements have a number of important limita- 
tions, however, which place limits on their general applica- 
bility. Many of these relate to the size and type of sample 
which can be analysed as discussed above. Protein NMR 
requires a high sample concentration (0.5-2 mM) at temper- 
atures above 15°C to allow detection of resonances with a 
high signal: noise ratio. Some proteins may aggregate under 
these conditions which precludes their analysis by this group 
of techniques. Since it is usually proton resonances which 
are measured in the analysis, it is essential that the sample 
is fully protonated for the duration of the experiment. At 
high pH values, the rate of proton exchange with solvent is 
high relative to the time-scale of the experiment. In practice, 
this means that multidimensional NMR measurements are 
performed in the pH range 4-8. This precludes the study of 
pH effects in the alkaline range and also places a limitation 
on the pH-stability of the sample since some proteins may 
not be fully stable or fully bioactive below pH 6.0. More- 
over, the refined structure resulting from the experiment 
outlined in Figure 6.21 represents a family of equally possi- 
ble structures and it can be difficult to distinguish between 
these statistically. In comparison to X-ray crystallography, it 
is particularly difficult accurately to quantify error through 
the structure and this error may vary from one part of the 
sequence to another and from one structure to another. 

Information obtained from NMR studies is particularly 
important because it is independent of, though complemen- 
tary to, data from crystallographic experiments, thus facili- 
tating detection of structure from molecules which may not 
readily crystallize. We will compare the two techniques at 


the end of this chapter but we must now turn to a description 
of the main method for structure determination in the solid 
state; X-ray diffraction. 


6.3 CRYSTALLIZATION OF 
BIOMACROMOLECULES 


The second widespread technique for determination of 
three-dimensional structure of biomacromolecules is based 
on the study of X-ray diffraction patterns by crystals of pro- 
tein/DNA called X-ray crystallography. The basic outline 
of this experiment is shown in Figure 6.26. The technique 
is equally applicable to structure determination of small 
molecules such as those encountered in organic/inorganic 
chemistry and is based on the fact that atoms diffract X-rays 
in a pattern which is dependent on their location in three- 
dimensional space. The reason X-rays are used is that their 
wavelength range is of the same order of magnitude as chem- 
ical bonds thus allowing us to ‘see’ at a resolution equivalent 
to interatomic distances (i.e. 0.8-2.5 Å). To detect diffracted 
X-rays with high enough sensitivity, it is essential that many 
atoms contribute to the diffraction pattern obtained. In prac- 
tice, this means that the molecule under investigation must 
be present in the ordered three-dimensional array of a crys- 
tal so that many equivalent atoms in different molecules 
contribute to the diffraction pattern. 

In the case of biomacromolecules, it is often difficult 
to obtain crystals of adequate quality and this is a major 
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Figure 6.26. X-ray diffraction experiment. A beam of X-rays is 
passed through a crystal. Most of the beam passes straight through 
but some of the beam is diffracted by the systematic arrangement 
of atoms in the crystal. When a monochromatic X-ray beam is 
used, it is necessary to rotate or oscillate the crystal slightly during 
the experiment (Section 6.3.2). Diffracted beams are detected as a 
two-dimensional array of spots of particular position and intensity 
forming the diffraction pattern. The crystal is then rotated through a 
small angle (e.g. 1°) and the measurement is repeated. At each angle 
some spots disappear, others appear and individual spot intensities 
change. 


limitation of this approach to structure determination. In 
particular, it means that biomolecules which do not form 
crystals (e.g. intrinsic membrane-bound proteins) are not 
immediately amenable to study by this technique. Notwith- 
standing this limitation, X-ray crystallography has resulted 
in determination of three-dimensional structures for thou- 
sands of proteins and is still far more widely used than 
multi-dimensional NMR for high-resolution structure de- 
termination. It has made a number of major contributions 
to biochemistry such as the demonstration that enzymes are 
protein in nature, the Watson—Crick model of DNA (al- 
though diffraction in this case was by DNA fibres) and 
understanding the molecular basis of phenomena such as 
allosterism and protein/DNA interactions. Moreover, struc- 
tures derived from X-ray crystallography provide the major 
experimental link between structure and function underlying 
most modern biochemistry and molecular biology. 

In this section we will describe the symmetry properties 
of crystals and how crystals of biomacromolecules suitable 
for X-ray diffraction work may be obtained. In following 
sections we will look at the diffraction experiment and how 
structure may be calculated from diffraction patterns. 


6.3.1 What are Crystals? 


A crystal (from the Greek krustallos meaning clear ice) can 
be defined as a regularly repeating three-dimensional array 
of atoms or molecules. Some crystals are formed naturally 
such as table salt (NaCl), graphite and diamonds (crystalline 
forms of carbon) and quartz (SiO2). These crystals are held 
together by strong ionic interactions (e.g. Na*—Cl-) which 
make them quite mechanically stable. Crystals of protein 
and DNA are far more fragile than these materials because 
they are held together by weak interactions such as hydro- 
gen bonds and van der Waals forces. Moreover, protein/DNA 


crystals are far more difficult to grow than such naturally oc- 
curring organic and inorganic crystals. Notwithstanding this, 
much of the terminology used in biomolecular crystallogra- 
phy derives from early work on organic/inorganic crystals 
and, in principle, the same overall experimental approach is 
followed in protein/DNA X-ray diffraction experiments as 
those with small molecule crystals. 

Diffraction is the process whereby electromagnetic radia- 
tion is scattered by matter. The phenomenon is not unique to 
X-rays. It also occurs with visible light and it is the focusing 
of diffracted visible light by the lens in optical microscopes 
which allows its recombination to form an image of the ob- 
ject. If it were possible to do the same thing with diffracted 
X-rays, then we could simply build a microscope to view 
the atomic structure of crystals directly. However, X-rays 
cannot be focused and, for this reason, atomic arrangements 
within the crystal must instead be determined by calcula- 
tion based on the X-ray diffraction pattern obtained (Section 
6.4). This calculation is possible or helped through the use of 
geometry to describe the arrangement of atoms or molecules 
within the crystal. 


6.3.2 Symmetry in Crystals 


The basic building block of a crystal is the unit cell which is a 
microscopic repeating unit within the crystal (Figure 6.27). 
This may be thought of as a ‘box’ of six faces or sides, 
defined by three lengths a, b, c (one for each edge of the 
box) and three angles a, 6, y (between the axes b—c, a—c 
and a—b, respectively) which are collectively referred to as 


Face 


Origin 


Figure 6.27. The unit cell. A unit cell is determined by lattice 
constants consisting of three edges (a, b and c) and three angles 
between them; a(c — b), B(c — a) and y(a — b). This also forms 
six faces. A crystal consists of unit cells which are repeated in three 
dimensions. Many different unit cells could be chosen for a crystal 
but it is usual to choose the unit cell with shortest edge-lengths, 
angles nearest to 90° and with highest symmetry (Table 6.4). 
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Figure 6.28. Symmetry operations and symmetry elements in crystallography. A symmetry operation relates the location of one object 
(left) to that of another (right) in the crystal array. Six distinct symmetry operations are shown (a-f). The points, planes axes, and so on 
about which these operations occur are the symmetry elements. (a) Rotation. A counterclockwise rotation of 360°/n where n is 1, 2, 3, 4 
or 6. A twofold axis where n = 2 resulting in a 180° rotation is illustrated. ‘Out’ and ‘in’ denote that arrows are protruding out of and into 
the page, respectively. (b) Mirror plane. A reflection through a plane is shown by which one object is converted to its mirror image. (c) 
Inversion centre; objects are related through a point called the centre of symmetry. (d) Translation; objects are related by translation through 
a distance, d, in a single direction. (e) Screw axis; objects are related by a translation p/n (p is an integer < n) along the axis of rotation 
followed by a rotation as in (a) (f) Glide plane; objects are related by reflection as in (b) followed by a translation in a single direction as in 
(d) through a distance which is half the unit cell dimension parallel to the mirror plane. Because of the chirality of biomolecules, (b), (c) 
and (f) are not applicable and do not occur in crystals of protein and DNA. 


lattice constants. The unit cell may contain part of a 
molecule, a single molecule or several molecules of the 
species under investigation. To arrive at a final structure, 
however, it is essential that the volume of the unit cell is 
equal to or greater than the volume of one molecule. The 
unique part of the unit cell is called the asymmetric unit. 
This is the smallest part of the unit cell which lacks internal 
symmetry. Symmetry simply means that the location of an 
atom or molecule when subjected to a symmetry operation 
around a point, plane or axis may be converted into that 
of another atom/molecule in the same unit cell. The point, 
plane or axis is referred to as a symmetry element. The main 
symmetry operations and elements encountered in crystal- 
lography are summarized in Figure 6.28. In order to describe 
a crystal, several symmetry elements may be combined and 
a particular combination of symmetry elements is called a 
space group. 

The unit cell will frequently contain more than one pro- 
tein/DNA molecule. These may be related to each other 
by noncrystallographic symmetry. This is not part of the 
symmetry defined by the space group which is referred to 
specifically as crystallographic symmetry. 

It is important to understand that unit cells, symmetry 
operations, space groups, and so on are imaginary mathe- 
matical devices allowing us to describe a crystal uniquely 
and to simplify calculations. The crystal is in reality made up 
of hundreds of thousands of individual atoms and our goal 
in X-ray diffraction is to identify and quantify the three- 


dimensional arrangement of these atoms. A single crys- 
tal may potentially be described by several unit cells/space 
groups. This choice is to some extent arbitrary. In the ex- 
ample shown in Figure 6.29, it is clear that the simplest unit 
cell contains one object while a more complex possible unit 
cell for the same array contains two objects and may have 
higher symmetry. This is because the latter unit cell is a 
more ‘unique’ representation of the pattern in the crystal. In 
general, a unit cell is chosen which has the shortest possible 
lengths for a—c, the closest angles possible to 90° for the 
angles a—y and the highest possible symmetry. 

The crystal visible at the macroscopic level is composed 
of atoms/molecules repeated regularly in three dimensions 
according to a mathematical pattern called the crystal lat- 
tice. A lattice is defined as an array of points repeated 
periodically through three-dimensional space such that the 
arrangement of points around any one point is identical to 
that around any other point in the same array and in the 
same orientation in space. In fact, a crystal may be thought 
of as a succession of symmetry operations around symmetry 
elements on the asymmetric unit of the unit cell to produce 
a crystal lattice. Theoretical investigations of crystals in the 
nineteenth century arrived at the conclusion that there are a 
limited number of possible geometric arrangements in crys- 
talline arrays. 

Depending on the values and relationships of their lat- 
tice constants, seven main types of unit cell are possible 
which are called the crystal systems. In order of increasing 
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Figure 6.29. Symmetry in crystals. One possible unit cell (broken 
lines) contains one object per unit cell and is described as ‘primi- 
tive’. Another possible unit cell shown with solid lines contains two 
objects and therefore has higher symmetry. Both unit cells could 
be used to represent this two-dimensional array of objects but, in 
three-dimensional crystallography we select the unit cell with high- 
est symmetry. 


symmetry these are triclinic, monoclinic, orthorhombic, 
tetragonal, trigonal/rhombohedral, hexagonal and cubic. Six 
of these can form primitive lattices which are denoted by (P) 
(Table 6.3, Figures 6.29 and 6.30). The word ‘primitive’ 
(from the Latin primus for first) refers to the fact that the 
cell contains only one lattice point as illustrated in Figure 
6.29. In addition to the primitive lattices, seven further lat- 


Table 6.3. The seven crystal systems. 


Possible Bravais 


tices are possible. Two of these feature an extra point at the 
centre of one of the six faces. By convention, this face is 
denoted as the a—b face and such lattices are referred to as 
C-centred (C). Extra points on each of the six faces result 
in a face-centred lattice (F) of which two are possible. An 
extra point in the centre of the primitive unit cell is possible 
in three lattices which are known as a body-centred and de- 
noted as (/; from the German innenzentrierte for interior). 
In all, there are a total of fourteen Bravais lattices the char- 
acteristics of which are summarized in Table 6.3 and which 
are illustrated in Figure 6.30. It may be further calculated 
that there are 230 mathematically possible space groups. 
However, since proteins and nucleic acids are usually enan- 
tiomorphic (Section 3.5.2), only 65 of these are relevant to 
crystallography of biomolecules. 

Diffraction of X-rays by a crystal results in a pattern 
which is mathematically related to the pattern of the crystal 
lattice. One of the first steps in analyzing diffraction patterns 
is to assign the crystal to its specific space group in a Bravais 
lattice with maximum symmetry as shown in Table 6.3. This 
analysis also determines the shape and dimensions of the 
unit cell which are important parameters in calculation of 
structure from crystallographic data. 

An important point of difference between crystal lattices 
of biomacromolecules such as proteins and small molecules 
like NaCl is that real crystals of large molecules frequently 
consist of mosaics of many microcrystals aligned more or 
less in the same direction. This mosaicity has the result that 
diffracted X-ray beams from crystals of large molecules 
actually emerge as narrow cones rather than fine beams 
which means that they need to be collected over a very 
small angle in practice rather than at a single, fixed angle. 


6.3.3 Physical Basis of Crystallization 


The crystalline state is achieved when molecules are re- 
cruited in a highly ordered manner from the liquid phase 


Crystal system lattices Minimum symmetry Restrictions on lattice constants 
Triclinic P Single onefold symmetry axis a#b#c;a+#ß# 

Monoclinic P,C One twofold axis (// to b) a 4b £+ ca = y = 90°; B + 90° 
Orthorhombic P,C,I,F Three mutually Perpendicular twofold axes a +Æ b Æ c;œ = $ = y = 90° 
Tetragonal P, I One fourfold axis (// to c) a=b + c;a = f = y = 90° 
Trigonal P One threefold axis (// to c) a =b #4 ca = f = 90°; y < 120° 
Rhombohedral R One threefold axis (// to c) a=b=cja=B=y 490° 
Hexagonal P One sixfold axis (// to c) a =b #4 ca = f = 90°; y = 120° 
Cubic P, I, F Four threefold axes (along cube diagonals) a = b = c;«œ = $ = y = 90° 


Rhombohedral and Trigonal are closely related and usually regarded as a single. Trigonal, crystal system. Trigonal R, shown in Fig. 6.30, 
is Rhombohedral. A Trigonal P form (with a ‘primitive’ lattice) would resemble Hexagonal P except the y angle would be <120° rather 


than =120°. 
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Figure 6.30. The Bravais lattices. There are fourteen types of lattices possible, depending on the lattice constants (Table 6.4). These are 
designated as primitive (P), C-centred (C), face-centred (F) body-centred (I) or rhombohedral (R). 
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(i.e. solution) to the solid phase (i.e. crystal). This contrasts 
with phenomena such as precipitation in which molecules 
leave solution in a random manner forming an amorphous 
and disordered aggregate. Inorganic materials such as NaCl 
form crystals very readily but more complex molecules such 
as proteins and DNA will only crystallize under a compar- 
atively small range of chemical circumstances. In fact, we 
have a very limited understanding of biomacromolecular 
crystallization and it is difficult to predict before a crys- 
tallization trial whether or not crystals will be successfully 
obtained. For this reason the normal practice is to screen em- 
pirically many different sets of crystallization conditions to 
identify the very small number which will produce suitable 
crystals in a given case. 

Crystallization of biomacromolecules is highly depen- 
dent on the availability of pure sample. While techniques 
described elsewhere in this volume may be used to assess 
purity of protein and other samples (e.g. SDS PAGE of pro- 
teins, Section 5.3.1; reversed-phase HPLC of proteins or 
DNA, Section 2.6) these may not detect microheterogeneity 
within the sample. This could be due, in the case of pro- 
teins, for example, to variable patterns of glycosylation or 
disulfide bridge formation, N- or C-terminal heterogeneity 
or the possibility of several structural conformations for the 
sample molecule. Microheterogeneity due to differences in 
covalent structure can usually be detected by mass spec- 
trometry or electrophoresis (Chapters 4 and 10). 
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A process likely to result in microheterogeneity within 
the sample will tend to hinder crystal formation; hetero- 
geneity interferes with orderly recruitment into and efficient 
packing within the crystal. Frequently, it is easier to crys- 
tallize a protein in the presence of a low molecular weight 
ligand as this reduces conformational variability within the 
population of sample molecules. This process is referred 
to as cocrystallization. Good examples are cocrystallization 
of enzymes with inhibitors or of hormone receptors with 
hormones. The necessity for ultrapure sample also leads to 
a requirement for particularly low batch-to-batch variation 
during protein purification or oligonucleotide synthesis for 
crystallographic work. 

The general approach taken to crystallization is to ex- 
pose a concentrated aliquot of a sample of protein/DNA to 
a concentration of precipitant which is slightly below that 
necessary to precipitate the sample. The most widely used 
precipitants are low molecular weight poly-alcohols (e.g. 
hexane diol), salts (e.g. ammonium sulfate), high molecular 
weight polymers (e.g. polyethylene glycol) and organic sol- 
vents (e.g. methanol). At the beginning of the experiment, 
the sample solution, though concentrated, is still unsatu- 
rated (see the phase diagram in Figure 6.31). During the 
experiment the protein and/or precipitant concentration is 
increased slowly such that eventually the saturation point 
of the phase diagram is reached. This is the point where the 
solid and liquid states of the protein exist in equilibrium 
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Figure 6.31. Phase diagram for protein. The region demarcated by the lower curve represents the limit of solubility of a protein. This is 
characterized by equilibrium between the solid and liquid states of the protein. The region above this represents supersaturation which may 
be divided into a metastable and labile region. In the metastable region, crystals can grow while in the labile region stable nuclei can both 
form and grow. Supersaturation is characterized by nonequilibrium between the liquid and solid states of the protein which drives crystal 
formation. The hatched area shows the optimum part of the diagram for growth of crystals suitable for diffraction. 
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with each other (Figure 6.31). Any increase in solid protein 
here tends to be counterbalanced by dissolution of solid pro- 
tein with the consequence that crystals cannot grow at the 
saturation point. 

Formation of a crystal depends on the formation of a 
stable nucleus. This is initiated by two or more sample 
molecules assembling together to form a suitable ‘platform’ 
onto which other sample molecules can assemble. Ordered 
successive addition of sample under the nonequilibrium con- 
ditions of supersaturation leads to growth of the crystal in 
three dimensions until equilibrium is re-established. The 
driving force behind this process is formation of the max- 
imum number of intermolecular bonds such as hydrogen 
bonds and van der Waals interactions. Of course, there are 
many more ways for a protein/DNA molecule not to forma 
crystal since entropy in chemical systems tends to the max- 
imum. Therefore, it is much more common for a nucleus 
to grow to a certain point and then stop growing. Such a 
nucleus is called an unstable nucleus. Alternatively, sample 
molecules may add together in a disordered manner lead- 
ing to precipitation. Conditions leading to formation of a 
crystal of suitable size (minimum 0.5 mm) and quality for 
X-ray diffraction are rare and are usually only found after 
exhaustive screens of hundreds or thousands of individual 
conditions. 

If the protein concentration is increased beyond satu- 
ration, that is to supersaturation, a nonequilibrium state 
is achieved. At supersaturation, the tendency of the sys- 
tem to return to equilibrium forces protein into the solid 
phase, occasionally resulting in crystal formation. In prac- 
tical terms, the supersaturated state may be achieved by 
evaporation of solvent or by manipulation of temperature 
(Section 6.3.4). The supersaturated region of the phase di- 
agram may, in turn, be divided into metastable and labile 
regions (Figure 6.31). The metastable region is defined as 
a region where stable nuclei cannot form. If, however, sta- 
ble nuclei are introduced into the metastable region, they 
can continue to grow. This is distinguished from the la- 
bile region where stable nuclei can both form and continue 
to grow. 

The importance of this is that, if the supersaturated system 
is very far from equilibrium (e.g. far into the labile region), 
then very many stable nuclei can rapidly form in the early 
stages of crystallization but crystal growth then slows and 
halts as equilibrium is re-established. This results in a large 
number of very small and often imperfect crystals which 
are not suitable for diffraction (these are called ‘showers’ of 
microcrystals). Conversely, the nearer the system is to the 
metastable region, the fewer stable nuclei are formed and the 
slower crystallization proceeds in the early stage of the ex- 
periment. This is the situation desired in biomacromolecule 
crystallization as it results in a small number of large crys- 
tals of the high quality required for diffraction (Figure 6.32, 


see colour plate section). The behaviour of each individual 
protein or DNA molecule in the phase diagram is unique 
which is why hitting on optimum conditions resulting in 
the formation of a small number of large crystals is so 
difficult. 

Any variable affecting protein solubility will affect crys- 
tallization. For this reason, it is usual to establish a discrete 
set of chemical conditions which include a specific pH, 
temperature, buffer and precipitant. Very small differences 
in pH (e.g. 0.1 pH units) can sometimes make the difference 
between obtaining crystals or not. Similarly, crystallization 
is particularly sensitive to temperature and, since several 
months may be necessary for crystals to form, it is essential 
that samples are not exposed to fluctuations of tempera- 
ture. This is achieved by storing samples under conditions 
where temperature is tightly controlled (e.g. in temperature- 
controlled rooms, refrigerators or incubators). It has even 
been discovered that the earth’s gravitational field may hin- 
der the growth of crystals and protein crystallization trials 
in the near-zero gravity of space are now a common feature 
of space shuttle missions. Unlike inorganic molecules, pro- 
tein and DNA structures are large, complex and sometimes 
labile and their charge and shape characteristics may alter 
during the crystallization process in highly-individual ways. 
This is why crystallization of biomacromolecules is such a 
slow and somewhat inexact process. 

In addition to nongrowth of crystals, a number of other 
problems can be encountered during crystallization trials. 
Twinning can occur when two crystals grow together. Be- 
cause they consist of two distinct lattices, they cannot 
be used to determine structure. This phenomenon can be 
avoided by inclusion of organic solvents such as 1,4 dioxan 
in the crystallization buffer. Growth of larger single crystals 
can be encouraged by increasing the volume of the sample 
or by manipulating the conditions to slow crystallization 
(e.g. by decreasing the protein concentration). 


6.3.4 Crystallization Methods 


Because of the vast number of possible crystallization con- 
ditions (pH, precipitant concentration/identity, ion strength, 
temperature) and the limited amount of protein normally 
available, it is not realistic to assess every possibility. In- 
stead, a process called sparse matrix sampling has come to 
be widely used. This consists of a set of 25-50 individual 
conditions which have been chosen based largely on pre- 
viously successful crystallizations. These are now commer- 
cially available as kits and are often used in initial crystal- 
lization screens. Once one or a small number of conditions 
producing some crystalline material have been identified, 
further secondary screens using incrementally different con- 
ditions are then systematically carried out with the object of 
obtaining larger and better-quality crystals. 
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Crystallization methods provide a means to expose pro- 
tein/DNA samples to the supersaturated state usually by one 
or more of the following processes: 


1. Facilitating gradual evaporation of solvent (e.g. organic 
solvents). 

2. Achieving gradual desolvation of sample by poly- 
alcohols or salts. This is also achieved by the presence of 
polyethylene glycol or organic solvents which lower the 
dielectric constant of solvent and thus also alter the bulk 
properties of water. 

3. Variation of pH and temperature to reduce protein sol- 
ubility. These processes must occur under conditions 
which do not significantly alter the structure or stabil- 
ity of the sample molecule. 


We will now describe three of the principal crystallization 
methods currently in use. 

Microdialysis is a low-volume form of dialysis (Chap- 
ter 7) which is a process of diffusion across semipermeable 
membranes (Figure 3.33). A solution containing the desired 
crystallization condition is placed on one side of the mem- 
brane while the protein solution (5-50 ul) is placed on the 
other. Equilibration across the membrane gradually alters 
the properties of the protein solution to that of the crys- 
tallization condition. The properties of the crystallization 
condition can be gradually changed by altering the solution 


Membrane 


Protein 5-50 pl 


External 
solution 


Glass capillary 


Protein 5-50 ul Plexiglass 


~button’ 


Membrane 


External solution 


Figure 6.33. Microdialysis. Solution containing desired conditions 
(external solution) diffuses across a semipermeable membrane into 
the protein solution (liquid-liquid diffusion). Properties of the ex- 
ternal solution can be gradually varied leading to crystallization. 
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Figure 6.34. The ‘hanging drop’ vapour diffusion method. (a) 
Sample is suspended as a ‘hanging drop’ (3—10 yl) over a much 
larger volume (e.g. 500 u1) of the desired crystallization condi- 
tion. This equilibrates very slowly (days, weeks, years) with the 
drop through the vapour phase (hence ‘vapour diffusion’). (b) A 
24-well plate may be used to screen simultaneously 24 individual 
conditions. 


on that side of the membrane. Two formats of this technique 
are illustrated in Figure 6.33. One of these features a spe- 
cially designed commercially available plexiglass ‘button’ 
into which the protein sample is introduced. The other in- 
volves use of glass capillaries with dialysis membrane held 
in place across one end of the capillary. Both of these can 
be placed in the external solution to allow equilibration to 
occur. Microdialysis is especially suited to crystallization 
because diffusion across semipermeable membranes is rel- 
atively slow. 

Vapour phase diffusion features slow equilibration be- 
tween a bulk crystallization solution and a small volume 
(<10 ul) of concentrated (10-50 mg/ml) sample through the 
vapour phase. The most popular form of this technique is 
the ‘hanging drop’ method (Figure 6.34). A small volume of 
sample solution mixed 1:1 with crystallization solution is 
pipetted onto a glass microscope slide. This is then inverted 
over a bulk (500 ul) crystallization solution. Equilibration 
occurs between the bulk and drop solutions until eventually 
the chemical conditions in the drop are identical to those of 
the crystallizing solution. The availability of 24-well linbro 
plates allows simultaneous screening of 24 conditions using 
as little as 24 ul of sample solution per plate. 
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Figure 6.35. The ‘sitting drop’ vapour diffusion method. Sample (10-20 zl) is placed in a depression formed in a 9-well plate. This is 
supported over the crystallization solution in a sealed chamber (e.g., on an inverted petri dish). Sample equilibrates with the crystallization 


solution through the vapour phase. 


Closely related to this is the ‘sitting drop’ technique in 
which a drop of sample is placed in a depression on a plate 
which is then placed in the vapour phase of a crystalliz- 
ing solution (Figure 6.35). Again, equilibration is through 
the vapour phase and the conditions of the drop slowly be- 
come identical to those of the bulk crystallizing solution. 
This technique is useful for assessing a single crystalliza- 
tion condition with several protein/DNA samples but is not 
as useful as the hanging drop procedure for screening a large 
number of crystallization conditions. 

The third major crystallization technique is free inter- 
face diffusion which is similar to microdialysis except no 
semipermeable membrane is used (Figure 6.36). A crystal- 
lization solution is layered gently on the sample solution 
and diffusion slowly generates concentration gradients at 
the interface. This creates localized supersaturation with the 


Interface 


Protein solution 


Crystallisation solution 
Glass capillary 


Figure 6.36. Free interface diffusion. This is generally similar to 
microdialysis except no membrane is used. Concentration gradients 
are generated at the interface which gives local supersaturation. The 
technique is particularly useful in microgravity. 


possibility of crystal formation. This technique is especially 
useful in the microgravity of space. 


6.3.5 Mounting Crystals for Diffraction 


The above discussion makes clear that crystals of biomacro- 
molecules are difficult to obtain in a size and quality suitable 
for diffraction. A further step is necessary before collection 
of diffraction data however. This is mounting the crystal in 
a format suitable for exposure to the X-ray beam. Crystals 
of protein or DNA are far less stable than inorganic crystals. 
Unintended collision of the crystal with a solid surface (e.g. 
a needle or the edges of plasticware) can be sufficient to 
cause it to crack or fragment. Exposure to mild heat (e.g. 
from fingertips transmitted through glassware), air or any 
chemical condition different to those in which the crystal 
was grown can introduce distortions into the crystal making 
ituseless for diffraction work and wasting the time expended 
in growing it. 

Protein/DNA crystals are usually mounted in thin glass 
capillaries suitable for assembly and rotation in the X-ray 
beam (Figure 6.37). The small size of the crystal necessi- 
tates availability of a dissecting microscope to follow the 
manipulations required. A single crystal is introduced into 
the capillary suspended in a small volume of the solution in 
which the crystal was grown (the mother liquor) by aspi- 
ration from the reservoir in which crystallization occurred. 
The crystal is then dried by removing this liquid by cap- 
illary action through wicks of paper. A small amount of 
mother liquor is then placed on either side of the crystal and 
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Figure 6.37. Mounting crystals. (a) A single crystal (minimum 0.5 mm) is transferred to a glass capillary by aspiration. (b) The crystal is 
dried by removing mother liquor with paper wicks. (c) A small plug of mother liquor is applied on either side of the crystal before sealing 
capillary ends with wax. Capillary is then mounted on the goniometer head. This holds the crystal in the X-ray beam at all angles of rotation 
(arrow). (d) View of a crystal mounted in a capillary (1) attached to a goniometer head (2) in an X-ray beam path (3). The lead beam stop 


(4) and area detector (5) are also visible. 
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the ends are sealed in wax. This creates an environment in 
which the crystal is constantly exposed to a vapour phase 
from the mother liquor and is protected from exposure to air 
and attendant contaminants. 

The mounted crystal can now be assembled on a go- 
niometer head in the apparatus used for diffraction. This 
is a device which allows arc and lateral adjustments to the 
three-dimensional position of the crystal such that, at any 
rotation angle, the crystal remains in the X-ray beam. The 
crystal is also aligned in such a way that one of the axes a—c 
is orientated along the axis of the X-ray beam. This point is 
discussed further in Section 6.4.3 below. 

Some crystals deteriorate rapidly at room temperature and 
it may be necessary to freeze them before exposing them 
to the X-ray beam. This is achieved by flash freezing the 
crystal in the presence of a cryoprotectant to a temperature 
of —196 °C in cooled liquid propane. This is carried out very 
quickly to freeze water molecules in the crystal lattice into 
amorphous ice rather than into ice crystals (which would 
disrupt the lattice). Frozen crystals are mounted on loops of 
fine nylon rather than in capillaries and are exposed to the 
X-ray beam in a cold stream at —196 °C. 


6.4 X-RAY DIFFRACTION 
BY CRYSTALS 


Having described crystals, some of their symmetry prop- 
erties and how we can prepare them for X-ray diffraction 
work we will now turn to the other important components of 
the experiment by examining the nature of X-rays and their 
diffraction by crystals. This leads to a data-set consisting of 
thousands of individual diffraction spots each with a par- 
ticular intensity, location and phase. In Section 6.5 we will 
see how this data is used to generate a three-dimensional 
structure for the sample molecule. 


6.4.1 X-Rays 


X-rays are a form of high-energy electromagnetic radiation 
with wavelengths in the range 0.1-100 x 107!° m (Chapter 
3; Figure 3.7) which were originally called Roentgen rays in 
honour of their discoverer, Wilhelm Roentgen. They may be 
generated by bombarding a copper or molybdenum target 
(anode) with a beam of accelerated electrons at a potential 
of some 10000 eV (Figure 6.38). The energy of this bom- 
bardment is absorbed by the dislocation of electrons from 
the inner K and M atomic shells to outer atomic shells. 
When these electrons return to the ground state, X-ray ra- 
diation is emitted. For crystallographic studies, radiation of 
à = 1.542 x 107! m (=1.5 A; also-called K, rays) are se- 
lected by passage through a graphite monochromator or a 
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Figure 6.38. Source of X-rays. (a) Bombardment of a copper anode 
target by a beam of electrons excites copper electrons to outer atomic 
shells. (b) Return to the ground state results in emission of X-rays 
of a number of wavelengths. Ky rays (A = 1.542 A) are specifically 
selected by passage through a graphite monochromator or a nickel 
filter. 


nickel filter. X-rays having a single wavelength are desir- 
able in crystallography because they give a single pattern of 
strong reflections. 

Two main laboratory-scale X-ray sources are widely used. 
Sealed tube generators feature an evacuated glass tube 
which contains a filament (source of the electron beam) 
and a hollow target. They have the important advantages of 
ease of maintenance and replacement but produce beams of 
limited energy output (up to 3 kW) because too-powerful an 
electron beam could melt the target. Rotating anode gener- 
ators consist of a target anode which can be rotated relative 
to the filament. This means that, since the electron beam 
impacts on a constantly-varied cool part of the filament, 
higher energy electron beams can be used which results in 
higher-power output (up to 12kW). In addition to these, 
synchrotron sources are large-scale facilities which use a 
particle accelerator to produce a continuous spectrum of 
high-energy X-rays and these are described in more detail 
in Section 6.5.9. 


6.4.2 Diffraction of X-Rays by Crystals 


In order to determine molecular structure it is necessary 
to use X-rays because the wavelength of this radiation is 
of a similar order of magnitude as atoms and covalent 
bond-lengths. The diffraction pattern obtained contains the 
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information we use to calculate this molecular structure. 
However, the diffraction pattern is a two-dimensional array 
of ‘spots’ of particular position and intensity (Figure 6.26). 
In order to convert this into a three-dimensional structure it 
is necessary to collect a data set of many such diffraction 
patterns from a single crystal (the precise number required 
depends on the symmetry and other characteristics of the 
crystal). Repeated exposure to the high-energy X-ray beam 
can sometimes lead to deterioration of the crystal during the 
experiment which can preclude the collection of a complete 
data set. In such cases it is often necessary to rescreen in an 
attempt to obtain more robust crystals, perhaps in a different 
crystal form. 

The electrons of atoms are responsible for diffraction of 
X-rays. This is due to an interaction between the X-ray 
electromagnetic wave and the electron. The electric vector 
of the electromagnetic wave (Chapter 3; Figure 3.1) induces 
an oscillation in the electron of a frequency identical to 
that of the wave (Figure 6.39). This results in emission of 
secondary radiation of a wavelength and frequency identical 
to that of the incident X-ray but 180° out of phase with it 
which is known as coherent scattering. A related process 
called anomalous scattering can occur if the frequency of 
the incident beam happens to be near that of a naturally- 
occurring oscillation in the electron. This latter phenomenon 
can be useful in determining the phase of scattered beams 
as discussed below (Section 6.6). 

Since electrons are responsible for diffraction, there is a 
close relationship between the intensity with which incident 
X-rays are diffracted and the atomic number (Z) of the atom 
which diffracts them. This may be quantified by the scat- 
tering factor, f, which is related to the angle of incidence 


4 . S =, ie. — 
NK Fi C — io NX xy k - 
; ae, \ 
Ay \ 
i Naparal f | n Secondary 
0) cect Nucleus radiation 
= E ; pin =" / 
EN ETA / N = ly~Or / 
\ ' z 


Xe ft 


Figure 6.39. Scattering of X-rays by electrons. (a) Coherent scat- 
tering. The beam of X-rays induces an oscillation in the electron 
(e) which has the same frequency as the incident radiation. This 
results in emission of secondary radiation which is 180° out of 
phase with incident radiation. (b) Anomalous scattering. When the 
frequency of the incident beam (v) corresponds or is close to a 
naturally-occurring frequency in the electron, this results in emis- 
sion of radiation known as anomalous scattering. 


(0) between the incident beam and the plane containing the 
diffracting atom and the X-ray wavelength. f is in fact a 
function of sin 6/A and has a maximum value of Z when 
0 =0. 

The scattering factor is a ratio between the intensity of 
scattering of X-rays by an atom and that of a single electron 
in the same experimental circumstances. Relative values (in 
brackets) for some atoms of biological interest are; !H (1), 
12C (6), 16O (8) and “Fe (26). This means that heavy metals 
diffract with much greater intensity than other atoms and that 
hydrogen diffracts so weakly that it is usually not detectable. 
For this reason the actual structure we calculate from X-ray 
diffraction patterns is an electron density map representing 
the location of C, N, S, O and other nonhydrogen atoms. 
Because of the similar scattering factors for N, O and C 
it can be difficult to distinguish side-chains of aspartate, 
glutamine and threonine from each other in electron density 
maps of proteins. 

The detection system for scattered X-rays should record 
both the position of the diffraction spot and its intensity 
as well as the overall diffraction pattern. X-ray diffraction 
patterns may be detected on photographic film since X- 
rays, like visible light, interact with film causing deposition 
of metallic silver. The intensity of each spot can be esti- 
mated relative to standard spots or by using a densitometer. 
The film is mounted in a camera which is orientated in 
a known angle relative to the incident beam and crystal. 
Alternatively, diffracted beams may be detected using elec- 
tronic detectors such as the area detector. These convert 
spot intensities and position to an electrical charge which is 
recorded (Figure 6.40). Electronic detectors have a number 
of advantages over film; they are more sensitive which short- 
ens the time necessary to collect a diffraction pattern (and 
hence shortens the exposure time of the crystal to the X-ray 
beam). Moreover, the data can be stored electronically in a 
computer-readable form which eliminates any requirement 
for a densitometer and again saves time. Once the crystal 
has been exposed to the X-ray beam, the diffraction pattern 
is collected and stored electronically. The detector is then 
erased, the crystal rotated slightly and a second diffraction 
pattern collected. In this way, a data-set of diffraction pat- 
terns is built up for the crystal. Because most of the incident 
beam passes straight through the crystal without being scat- 
tered, this would saturate the detector limiting its sensitivity. 
Accordingly, the detector is shielded from the direct incident 
beam by a circular piece of lead called a beam stop. More 
recently, detectors with an X-ray-sensitive phosphor screen 
coupled to a charge coupled device have become available. 


6.4.3 Bragg’s Law 


As with all electromagnetic radiation waves, diffraction can 
result in interference between diffracted waves which results 
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Figure 6.40. X-ray diffraction pattern collected on an area detector. 
A diffraction pattern collected for lysozyme crystals on an electronic 
area detector is displayed on a silicon graphics computer. Note 
the different intensities and location of diffraction spots forming a 
unique pattern. These intensities are converted into structure factors. 
After rotation through a small angle (e.g. 1°), a further diffraction 
pattern is collected. Some spots increase in intensity while others 
decrease. A series of such diffraction patterns forms the data-set 
used in calculation of the electron density map. 


in the waves either reinforcing or weakening each other de- 
pending on their relative phases (Chapter 3; Figure 3.44). 
These phenomena are characteristic of wave functions and 
are known as constructive and destructive interference, re- 
spectively. In an X-ray diffraction experiment, a complicated 
pattern of scattering of X-ray beams is observed in which 
both of these processes occur simultaneously. The pattern 
of interference depends on the distribution of atoms in the 
sample through which X-rays pass. Since crystals are highly 
ordered arrays of atoms, systematic interference patterns oc- 
cur which may be thought of as ‘encoding’ information on 
relative three-dimensional atomic locations. 

In 1913 Bragg proposed that a crystal may be regarded 
as a series of planes which behave as ‘mirrors’ reflecting 
X-rays. In a real crystal, these lattice planes cut through the 
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Figure 6.41. Lattice planes. (a) Three lattice planes (dashed lines, 
1-3) are shown for a two-dimensional array. These planes link corre- 
sponding atoms through the crystal lattice. (b) Miller indices allow 
identification of individual planes. They are designated in terms 
of (hkl, where h = a/a',k = b/b',l = c/c' representing points 
where the plane transects the unit cell (dashed lines). The plane 
shown is designated 122 since the axes are cut at a/1, b/2 and c/2. 
The faces of the unit cell may similarly be designated as 100 (a), 
010 (b) and 001 (c). 


crystal lattice in three dimensions (a two-dimensional rep- 
resentation is shown in Figure 6.41). Miller indices provide 
a convenient way of identifying specific three-dimensional 
locations for such planes in the lattice. They are defined as 
the three intercepts (h, k, /) that a plane makes with the cell 
axes, in units of the cell edge. For example, if a plane in- 
tersects the cell axes a—c at points a’, b’ and c’, the indices 
are given by h = a/a', k = b/b' andl = c/c’ (Figure 6.41). 
Values for h,k and / are usually less than six for most 
crystals. 

Consider two such parallel lattice planes, each containing 
equivalent diffracting atoms and separated by a distance, 
d (Figure 6.42). Constructive interference between beams 
diffracted from successive planes occurs when there is an 
integral difference in à between them (i.e. nA, A, 2A, 3A, 
etc.). In Figure 6.42 this condition is met when 

PQ+QR=naA (6.12) 
where PQ and QR represent the distances shown. The rela- 
tionship between these distances and the angle of incidence 
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Figure 6.42. Bragg’s law of diffraction. (a) X-rays incident on suc- 
cessive lattice planes at an angle 8 are reflected when nà = 2d sin 8. 
where d is the distance between planes. (b) This is because succes- 
sive lattice planes which obey Bragg’s law give rise to constructive 
interference. Note that the diffracted wave from the upper lattice 
plane has greater amplitude than that from the lower plane as a 
result of constructive interference. 


of the X-ray beam (6), is geometrical and is given by 


PQ = QR =dsind (6.13) 


Substituting Equation (6.12) into (6.12) gives us Bragg’s 
law of diffraction: 


2d sin@ = nd (6.14) 


In practice this means that the diffracted beams detected 
in an X-ray diffraction pattern (sometimes called Bragg re- 
flections) are those originating from successive lattice planes 
which obey Bragg’s law and reinforce each other by con- 
structive interference with a single reflection arising from 
each equivalent lattice plane (Akl) in the crystal. All other 
diffracted beams undergo some destructive interference and, 
since this would happen through many equivalent lattice 
planes in a real crystal, are so weakened in intensity that 
they are not detectable. The only experimental variable in 
the Bragg equation is the angle of incidence, 6 (since d is 
a fixed property of the crystal for a given hkl and A is in 
most circumstances a fixed property of the X-ray beam). 
Variation of 6 is achieved by slight rotation of the crystal 
(approximately. 1°) between exposures to the X-ray beam 
allowing detection of a distinct pattern of Bragg reflections 
for each angle of rotation. Ina complete data-set, most atoms 
in the crystal lattice will contribute some Bragg reflections 
to some diffraction patterns. 
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Figure 6.43. Reciprocal space. Objects in the crystal lattice 
(p,q,r) are related reciprocally to objects in the reciprocal lat- 
tice (p’q’r’). The triangle shown on the left therefore gives rise to 
a different-shaped triangle as shown above. 


6.4.4 Reciprocal Space 


Bragg’s law makes clear that there is a fixed relationship 
between the pattern of Bragg reflections obtained during X- 
ray diffraction and the spacing of atoms within the crystal 
lattice. This relationship can be clarified further using the 
concept of the reciprocal lattice which is widely used in 
the study of diffracting systems. This is a real-space lattice 
(often referred to as reciprocal space) which is related recip- 
rocally to points in the microscopic crystal lattice. The re- 
ciprocal relationship means that macroscopic measurements 
made at a distance of 10—100 cm from the crystal may be 
directly related to microscopic spacings in the crystal lat- 
tice (Figure 6.43). We can imagine a diffracting crystal as 
generating a macroscopic lattice in three dimensions around 
it. In our collection of data as diffraction patterns we sam- 
ple this reciprocal lattice in two dimensions (since detectors 
such as photographic film or electronic detectors are two- 
dimensional). As the crystal is rotated, the reciprocal lattice 
also rotates and, since the detector is fixed during collection 
of a single data-set, this means that we can sample a con- 
siderable amount of the crystal lattice by rotating the crystal 
through 45—180°. 

A consequence of Bragg’s law is that there is an inverse 
relationship between the angle 0 and d (i.e. sin « 1/d). This 
means that large spacing between lattice planes (which we 
find in the large unit cells of protein/DNA crystals) result in 
small values for 0 while small spacing (of the type we find in 
small unit cells of small molecule crystals) give large values 
for 0. Because the unit cells for protein/DNA crystals are 
much larger than those for small molecule crystals, the dis- 
tance between Bragg reflection spots is significantly smaller 
in diffraction patterns obtained from crystals of biomacro- 
molecules. This problem can be overcome by setting a larger 
crystal-detector distance with protein/DNA crystals than is 
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Figure 6.44. Alignment of crystal. The inner ring of diffraction 
spots (zero-order spots) are shown for a mounted crystal. A single 
diffraction pattern is collected to check alignment (left). This shows 
misalignment since the zero-order spots are visible. The position 
of the crystal is altered to bring zero-order spots behind the beam 
stop (inner circle). This procedure is not necessary if the crystal is 
oscillated during collection as described in Figure 6.26. 


used with crystals of smaller molecules even though this re- 
sults in less intense spots in the resulting diffraction pattern. 

In order to maximize the information obtainable from X- 
ray diffraction while minimizing the number of diffraction 
patterns required, it is necessary to position the crystal in 
the beam such that its unit cells are orientated in a known 
way along the X-ray beam. This process is called indexing 
the crystal. Since the shape of the crystal is usually re- 
lated to the dimensions and orientation of the unit cell, it is 
sometimes possible to align one of the crystal’s (and hence 
the unit cell’s) main axes a—c along the beam. However, 
macroscopic inspection is not always sufficient to identify 
the orientation of the axes (Figure 6.27). In practice, the 
crystal is first mounted in the path of the X-ray beam as 
described in Figure 6.37 and a single diffraction pattern is 
collected (Figures 6.40 and 6.44). The position of the crystal 
is adjusted such that the inner ring of diffraction spots is not 
visible in the pattern because it is incident on the beam stop. 
This means that one of the unit cell axes, a—c, is now orien- 
tated along the X-ray beam although, at this stage, we do not 
know which one. Inspection of the diffraction pattern allows 
calculation of the dimensions of the reciprocal lattice and, 
hence tells us the orientation and dimensions of the crystal 
lattice axes. 

The Ewald construction can be used to understand this 
concept better (Figure 6.45). This describes the geometri- 
cal relationship between the orientation of the crystal, the 
direction of X-ray beams diffracted from it as Bragg re- 
flections and the reciprocal lattice. An X-ray beam incident 
at an angle @ on a lattice plane (hkl) obeying Bragg’s law 
at point C will be diffracted as shown. A circle centered at 
point C may be drawn with a radius of 1/A. This is called 
the Ewald circle but, for a real three-dimensional crystal lat- 
tice, a three-dimensional Ewald sphere would be generated. 


If the crystal is rotated such that a reciprocal lattice point 
touches the surface of this sphere at a point corresponding 
to that where the diffracted beam intersects the circle (e.g. P 
in Figure 6.45), the origin of the reciprocal lattice is defined 
as point O on the circumference of the circle. This construc- 
tion therefore allows us to relate points in the crystal lattice 
to corresponding points in the reciprocal lattice and each 
lattice plane (Akl) in a diffracting crystal to a specific Bragg 
reflection. 

Determining how many reflections we need to collect in 
a given experiment depends on the properties of the crystal. 
As described earlier (Section 6.3.2), crystals contain consid- 
erable internal symmetry. A consequence of this is that there 
is identical though reciprocal symmetry in reciprocal space. 
This means that identical reflections may result at different 
crystal rotation angles. This has three useful consequences: 


1. Symmetry in the pattern of reflections allows identifica- 
tion and measurement of the lattice constants. 

2. All reflections do not need to be collected for every crys- 
tal. 

3. Averaging of equivalent reflections gives more accurate 
estimates of intensities than measurement of single re- 
flections alone. 


It would be possible to rotate the Ewald sphere through 
a larger sphere in reciprocal space (radius 2//) called the 
limiting sphere. This contains all the Bragg reflections from 
the crystal which might amount to 2 x 10° or more. In prac- 
tice, even a very large protein rarely gives rise to more than 
10° unique reflections because the diffraction pattern is lim- 
ited to 0 values around 20-25° for Kæ rays. The diffraction 
pattern fades strongly at angles outside this range. 


6.5 CALCULATION OF ELECTRON 
DENSITY MAPS 


We have now seen how we can grow crystals and, by pass- 
ing a beam of X-rays through them, collect a data-set con- 
sisting of a number of individual diffraction patterns each, 
in turn, consisting of spots of intensity J,,;. Because the 
atoms in the crystal are regularly arranged relative to each 
other within the lattice, diffracted beams are scattered in a 
systematic way which is directly related to their distribu- 
tion within the crystal lattice. As this scattering is in fact 
caused by electrons, there is therefore a clear relationship 
between the pattern of Bragg reflections obtained in the 
data-set and the density distribution of electrons within the 
crystal lattice and, hence, within each individual molecule 
of the sample under investigation. Calculation of this elec- 
tron density distribution is the goal of X-ray crystallogra- 
phy since it allows us to visualize and quantify molecular 
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Figure 6.45. The Ewald construction. The incident X-ray beam is diffracted from lattice plane (Akl) at point C. When a Bragg reflection 
corresponding to this intersects with a circle (radius of 1/4) where a reciprocal lattice point touches the circumference of the circle, the origin 
of the reciprocal lattice is defined as O. The three-dimensional equivalent to this is the Ewald sphere. This allows us to predict detection of 
Bragg reflections and to orientate the crystal lattice with respect to the reciprocal lattice. 


structure. The three-dimensional arrangement of electrons 
within the molecule is represented as an electron density 
map. In this section we will describe how Bragg reflections 
may be used to calculate an electron density map for the 
crystallized molecule. 

A large molecule such as a protein can generate hun- 
dreds of thousands of individual Bragg reflections, much 
more than crystals of low molecular mass organic/inorganic 
molecules. This is because there are thousands of individual 
atoms in a protein. Moreover, the intensity with which large 
molecules diffract X-rays is significantly lower than that for 
small molecules which means substantially more weak re- 
flections. The calculation of an electron density map from a 
large molecule such as a protein therefore represents a sig- 
nificantly more difficult task than a similar calculation for a 
small molecule, even though the individual steps which need 
to be taken are substantially similar. In current practice, cal- 
culation of electron density maps is carried out with the aid 
of powerful computer programs the availability of which is 
largely responsible for the speed with which macromolecu- 


lar structures can now be determined compared to a decade 
or two ago. 

Three distinct pieces of information act as inputs to these 
programs. (1) The individual intensity of each spot corre- 
sponding to a Bragg reflection (Inx); (2) the overall pattern 
of spots obtained; (3) the phase of the X-ray waves cor- 
responding to each spot. We will now look at how each of 
these inputs contributes to generation of the electron density 
map. 


6.5.1 Calculation of Structure Factors 


In order to calculate an electron density map the intensity 
of each Bragg reflection, /),,;, is converted into a structure 
factor, Fhx (note; since this is a vector quantity on the 
Argand diagram, it is conventionally written hereafter in 
bold type). This has an associated amplitude (Figure 6.46; 
Section 3.1.1) denoted hereafter by | Fazıl. These intensi- 
ties, Inz, are directly measured by the detector (or recorded 
on photographic film and then converted into intensity as 
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Figure 6.46. Representation of diffracted X-ray beam. The wave 
function is defined by wavelength (A) and amplitude (A) but is 
related to a point of origin by the phase angle (n1). The amplitude 
is related to the square root of the intensity (J;4;) for each Bragg 
reflection and directly to the structure factor, Fri. 


described in Section 6.4.2) and converted using the follow- 
ing relationship: 


(6.15) 


where K is a scaling factor dependent on properties such as 
beam intensity, crystal size and other fundamental constants. 
L is the Lorentz factor which corrects for the geometry 
of the detection system. P is a polarization factor, which 
corrects for differences experienced by electric components 
of the X-ray wave parallel to the lattice plane (Akl) compared 
to those perpendicular to the plane (this phenomenon was 
previously encountered in our description of plane polarized 
light; Section 3.5). 

In practice, an observed estimate of the structure factor, 
Fhki obs, for each Bragg reflection is first made. Once a model 
of the structure has been proposed, this is then employed 
through many iterations of a series of computer programs in 
a process of refinement (Section 6.5.8) each of which gen- 
erates a calculated estimate of the structure factor, Fix cates 
based on the current model. The discrepancy between these 
two, Arnau, is given by; 

A Fit = Frit obs — Fait catc (6.16) 

A quantity called the R factor represents a summation of 
all the structure factors for the proposed structure giving an 
indication of how acceptable it is: 


Dw Al Farul 


ee ame (6.17) 
X Al Fikilovs 


The R-factor is an important criterion for the quality of 
the final structure and allows comparisons of goodness-of-fit 


between crystal structures obtained for different molecules 
or different crystal-forms of a single sample molecule. It is 
usually in the range 3-5% for small molecules but may be 
as high as 10-20% for protein structures. 


6.5.2 Information Available from the Overall 
Diffraction Pattern 


A number of useful and important pieces of information 
can be gleaned from the symmetry and relative intensities 
of spots observed within the overall pattern of diffraction. 
This is analysed by a computer program which gives an 
estimate of the lattice constants and the space group of the 
structure. We can estimate from this how many asymmetric 
units are in the unit cell and the volume of each. From 
this information in turn, we can usually obtain the number 
of sample molecules in the asymmetric unit. The ratio of 
volume of asymmetric unit: molecular weight contained in it 
is known as Vn and, for protein samples, this ratio is usually 
in the range 1.8-3.0 A?/Da. Assuming protein crystals have a 
density consistent with 40-60% solvent content, an integral 
number of molecules per asymmetric unit usually results 
from this calculation. If the precise number of molecules 
per asymmetric unit is still ambiguous however, the density 
of the crystal can be directly measured in a density gradient 
(Chapter 7) although this is technically difficult for protein 
crystals because of their mechanical fragility. 

Because of the greater number of calculations required 
with each extra biomacromolecule per unit cell it is desirable 
that this number is as small as possible for rapid structure de- 
termination (e.g. 1-4). Occasionally, high-quality diffract- 
ing crystals may contain 6-8 protein molecules per unit 
cell making calculation of a final structure almost impos- 
sible with current computational facilities. In such cases, 
rescreening may be necessary to obtain crystals of the same 
molecule with a smaller number of molecules per unit cell. 


6.5.3 The Phase Problem 


In Chapter 3 (Section 3.1.1) we saw how a wave function 
is defined by two terms; the amplitude and wavelength. 
Because in X-ray crystallography we are interested in the 
three-dimensional location from which a diffracted beam 
originates (i.e. the location of the diffracting atom) we need 
a third term fully to describe the diffracted X-ray wave called 
the phase angle, nx (Figure 6.46). This is defined relative 
to an origin (i.e. the point where a = b = c = 0) which, 
for a complete data-set, is of necessity chosen arbitrarily. 
Because each Bragg reflection results from a combination 
of wavelets scattered by all the atoms in the molecule, dif- 
ferent reflections have different phase angles which must 
be determined in order for an electron density map to be 
calculated. 
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‘ hakala 


Figure 6.47. Geometric description of diffracted beams for differ- 
ent hkl. Three beams are shown which are out of phase and have 
different phase angles. The structure factor F;,; is a resultant vector 
between the cosine and sine vectors A’ and B’. The angle formed 
between /FI);,,; and the horizontal is a,,4;. The relationship between 
the wave description (below) and geometric description (above) is 
shown. 


An alternative way to represent the wave function is as 
shown in Figure 6.47. The wave is represented as a line the 
length of which is proportional to the amplitude and equal 
to | Fazıl. A circle with a radius equal to | F),,;| represents one 
complete wavelength. The angle formed by the line with 
the horizontal is called the relative phase angle, a,x). It is 
clear from this representation that | F),,;| can be considered 
as a resultant vector between two vectors Aj,,, and By, 
representing cosine and sine waves, respectively. The phase 
angle is equal to the ratio of these vectors: 

$ = tanany = By /Ajg (6.18) 

The intensity is also geometrically related to their magnitude 
as 

2 2 2 

Thea = (Apul + Brul? = | Fic (6.19) 

While structure factors Fj,; may directly be obtained from 


the intensities of Bragg reflections detected in the diffraction 
pattern, we cannot obtain phase angles directly from such 


reflections. In other words, the diffraction data-set from a 
crystal contains the amplitude information but has lost phase 
information. Since we need to know both the phase and the 
amplitude of each Bragg reflection to calculate the electron 
density map this lack of phase information provides a barrier 
to structure calculation called the phase problem. A number 
of strategies have been developed for the recovery of phase 
information which we now describe in Sections 6.5.4 to 
6.5.6. 


6.5.4 Isomorphous Replacement 


This technique was developed for proteins by Kendrew 
(working on myoglobin in 1960) and Perutz (working on 
haemoglobin in 1968). It involves the introduction of a heavy 
metal atom such as mercury, platinum, uranium, and so on 
at a unique location within the molecule called single iso- 
morphous replacement. This may be achieved by soaking a 
crystal with a solution of a reactive chemical containing the 
heavy metal which then attaches to the biomacromolecule 
(Figure 6.48). It is important that this reaction does not 
result in an altered structure, crystal packing or unit cell 
size (hence the term isomorphous meaning having the same 


(a) 
Heavy metal derivative | 
(2 heavy atom sites) 


[eal] 
Native protein 


| Heavy metal derivative 2 
(2 different heavy atom sites) 


(b) 


Figure 6.48. Isomorphous replacement in proteins. A protein crys- 
tal containing two molecules per unit cell is shown. Isomorphous 
replacement involves soaking these crystals in a solution of a reagent 
containing a heavy metal which attaches to the protein. (a) A heavy 
metal or its chemical derivative is introduced in the protein. Several 
sites can be occupied as shown. (b) Heavy metal atoms or their 
chemical derivatives are introduced in a second experiment. The 
sites occupied should be different from those in (a). If only one 
heavy metal derivative is available (e.g. (a) or (b)) the method of 
single isomorphous replacement is used. If two or more derivatives 
are available, multiple isomorphous replacement is used. The latter 
gives better phasing. 
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Fy 


Feu 
OPH 


Figure 6.49. Single isomorphous replacement of a protein. The 
structure factor vectors for a native protein (Fp), heavy-metal mod- 
ified protein (Fpp) and heavy metal alone (Fy) are shown together 
with the corresponding relative phase angles (ap and apy). 


shape). X-rays are diffracted through this crystal and, since 
a heavy metal contains many electrons and therefore would 
be expected to have high scattering factors (see above), the 
location and phase of Bragg reflections from metal atoms 
can be identified by comparison with a data-set from the 
crystal lacking heavy metal. 

Figure 6.49 illustrates the relationship between structure 
factors for native protein (F),,;p), heavy-metal modified pro- 
tein (Fnx px) and heavy metal alone (Faxin) to each other as 
vector quantities: 


Frapa = Faae + Frkin (6.20) 


Each of these structure factors has a relative phase angle 
(Onx1) associated with it. Provided that we know the three- 
dimensional locations of the heavy metal (i.e. both op 4 
and | Fazlu), the relative phase angle for the protein and both 
|Freilp and | Fix:|pa (which are experimentally measurable 
from diffraction intensities), a),;p, can be calculated from 
the following equation: 


Oper = Cnr E COS! [(| Frelon — |Faele — |Finile)/ 
2| Frxilel Fnxilu] (6.21) 


The location of the heavy metal is arrived at using dif- 
ference Patterson maps. These are created using a Fourier 
synthesis (Appendix 2) denoted A P(uvw), using the Miller 
indices (hkl) and the observed structure factor amplitude 
differences A| Fiala = |Faetlpy — |Faslps 


AP(uvw) = n YOYOY AlFe 


x cos 2x (hu + kv + lw) (6.22) 


(a) 


x Ə g (x + uy + vz + w) 


(b) 
B to A (—u-v-w) 


„~ Origin Fr 


Ato B (u v w) 


Figure 6.50. Patterson maps. (a) The relative locations of two 
atoms, A and B are represented in terms of their cartesian coor- 
dinates (xyz). (b) A Patterson map represents their relative loca- 
tion in terms of vectors with the location of one atom of the pair 
placed at the origin. Note that two vectors are possible (uvw) and 
(—u — v — w). 


We shall see later that the form of this equation is very 
similar to that used for calculation of electron density maps 
(Equation (6.29) below) with the exception that it does not 
include a term for relative phase angle (œn). 

If two atoms in a unit cell are separated by a vector (uvw), 
there will be a peak in the Patterson map at (uvw). In terms 
of real space this represents a vector between two atoms, 
one at (xyz) and the other at (x + u, y + v, z + w) (Fig- 
ure 6.50). Patterson maps represent one end of the vector at 
the origin and the other at (uvw). We can therefore measure 
the magnitudes of u, v and w directly from Patterson maps 
although we do not know x, y or z. The height of the peak 
is approximately proportional to the product of the atomic 
numbers of the two atoms. Thus, the Patterson map of a 
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protein modified with a heavy metal is dominated by vec- 
tors between the metal and other atoms. 

A crystal containing a single heavy metal in each unit 
cell which diffracts X-rays systematically will generate a 
Patterson map dominated by peaks representing vectors to 
and from the heavy metal atom. However, as shown in Fig- 
ure 6.50, any pair of atoms yields two such vectors; (uvw) 
and (—u — v — w). In the case of a protein, this means that 
Patterson maps of a protein can generate two relative phase 
angles, Opxpa and Ængipg as Shown in Figure 6.51. One of 
these phase angles is true and the other imaginary. In order 
to distinguish which of the pair is true, it is necessary to 
include a second (and sometimes a third) heavy metal at an- 


fa) 


(b) 


Figure 6.51. The Harker construction. (a) Single isomorphous re- 
placement is represented as two circles, one with a radius of the 
structure factor for the heavy metal-derived protein (Fpy) and the 
other with a radius of that of the native protein (Fp). Note that two 
possible relative phase angles are possible (wpa and apg). (b) This 
ambiguity may be resolved by using a second heavy metal deriva- 
tive which will generate a third circle of radius Fpy2. The point 
where all three circles intersect represents the true Fp and hence 
identifies apa as the true wp for the protein data-set. 


other unique location in the unit cell. The effect of this may 
be seen using a geometrical construction called the Harker 
construction (Figure 6.51). This experiment is called mul- 
tiple isomorphous replacement. It results in a single value 
for the term o,;p (Equation 6.21), the relative phase an- 
gle, which is included in the equation for calculation of the 
electron density map (Section 6.5.7). 


6.5.5 Molecular Replacement 


Sometimes the structure of a protein or DNA molecule sim- 
ilar to that under investigation will have been previously de- 
termined by X-ray diffraction. Such cases arise, for example 
with mutants/wild-type proteins generated by SDM experi- 
ments, families of structurally related isoenzymes, proteins 
cocrystallized with different ligands and functionally related 
protein families. Even where identity between pairs of pro- 
teins is limited or localized, molecular replacement can be 
carried out on fragments of their structures providing an 
alternative means to solve the phase problem. 

Patterson maps contain two distinct types of vectors. Self 
vectors are vectors between pairs of atoms within each in- 
dividual molecule of protein or DNA while cross vectors 
are those connecting pairs of atoms in different molecules 
in the unit cell. In general, these two groups are readily 
distinguished from each other because self vectors are sig- 
nificantly shorter than cross vectors. If vector maps are cal- 
culated for a pair of molecules for which the structure of 
one is known (the model), then the set of self vectors will 
be identical in the vector map of the unknown molecule (the 
target) compared to that of the model. Usually, this rela- 
tionship involves rotation around a set of three angles with 
each individual rotation creating a new set of (x, y, z) axes. 
Eventually, these rotations bring many of the peaks in one 
Patterson map into agreement with those of the second. A 
faster way to do this is to rotate the Fourier transform of 
the model structure relative to the observed structure factors 
of the data-set from the target. The mathematical tools to 
carry out these operations are known as rotation functions. 
Their effect is to orientate the two models correctly relative 
to each other. 

The second stage in molecular replacement is to position 
the fragment with respect to the crystallographic axes and 
the origin of the unit cell. This is achieved with a translation 
function which moves the Patterson map for the target rela- 
tive to that for the model by translation but without chang- 
ing their relative orientation. As with the rotation function, 
the translation function is defined by three translations in 
the three dimensions. In all, six parameters therefore de- 
fine molecular replacement. The best translation function 
is identified by calculation of structure factors at various 
points in the asymmetric unit and comparing these with 
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observed values using the R factor described in Section 
6.5.1 (Equation (6.17)). 

Once preliminary estimates of the relative phases are 
available for the target and model structures, these can be 
improved using standard refinement procedures. If noncrys- 
tallographic symmetry is present in the crystal, that is sym- 
metry not described by the crystal space group, this can help 
or influence the refinement procedure For example, two sub- 
units of a protein may be related by a symmetry which is 
not crystallographic. If a phase has been determined for two 
molecules related by noncrystallographic symmetry within 
the space group, then their electron density maps will also be 
symmetrically related. Any differences in electron density 
between them may be due to an inaccurate phase deter- 
mination. If the electron density associated with the two 
molecules is averaged, however, then this must result in a 
more accurate set of phases. This process is known as real 
space averaging. A similar approach can be used to relate 
electron densities calculated for the same molecule crystal- 
lized in two different crystalline forms. Provided that they 
can be related by a translation function, their electron densi- 
ties can similarly be averaged and the phase improved. This 
approach has particular applications for very large structures 
such as viruses where extensive noncrystallographic sym- 
metry exists and many averaging steps can be gone through 
to improve phase determination. 


6.5.6 Anomalous Scattering 


The physical basis of anomalous scattering of X-rays has 
already been described in Section 6.4.2 (Figure 6.39). If 
the frequency of incident X-rays corresponds to a natu- 
rally occurring frequency of oscillation in the electrons of 
a diffracting atom, then anomalous diffraction can occur. In 
practice, most atoms in crystals of biomacromolecules will 
not undergo anomalous diffraction with fixed wavelength 
Ka X-rays. However, some X-ray sources (e.g. synchrotron 
sources; Section 6.5.9. below) are tunable in that they can 
produce X-rays across a range of wavelengths and hence 
frequencies (although see Example 6.5 for a case where this 
was achieved with a rotating anode source). Such sources 
can be used in conjunction with a very limited range of atoms 
which can undergo anomalous diffraction with wavelengths 
in the range 0.5-3.0 A. Some of these are biologically occur- 
ring atoms such as Fe and Ca but most are heavy metals in 
the atomic number ranges 20 (Ca) to 47 (Ag) and 50 (Sn) to 
92 (U) which can be introduced into the crystal by soaking. 
Multiple wavelength anomalous scattering can therefore be 
regarded as analogous to isomorphous replacement in that 
one or a small number of atoms are introduced into the crys- 
tallized molecule to allow determination of relative phase 
angles. A popular method is to replace a residue in the pro- 
tein of interest with an analogue containing a heavy metal. 


Good examples of this are replacement of (in proteins) me- 
thionine by selenomethionine (in which Se replaces S) and 
(in DNA) replacement of cytosine by 1-iodocytosine or of 
thymine by 5-iodouracil (Examples 6.4 and 6.5). This may 
be achieved by replacing methionine with selenomethion- 
ine in the media on which bacteria are grown to produce 
the protein or, in the case of DNA, by introducing the iodo- 
nucleotide in one step of the oligonucleotide synthesis cy- 
cle. Each of the metals capable of anomalous scattering has 
particular advantages and limitations. Among the most im- 
portant of these is the protein relative mass range in which 
they are useful. Most are limited to 10-14kDa while Hg is 
useful up to 77 kDa. 

The rather narrow range of X-ray frequencies currently 
possible is principally responsible for the limited number 
of atoms capable of anomalous diffraction. Combined with 
the M, limits of the technique, this means that anomalous 
diffraction is used less frequently than either isomorphous 
or molecular replacement as a means of determining relative 
phase angles. 

The scattering factor (fanom) for an atom scattering X- 
ray radiation anomalously is more complex than that for 
coherent scattering previously described in Section 6.4.2: 

faon = f + f +if” (6.23) 
where f’ and f” vary with the wavelength of the incident 
radiation. f is the unperturbed scattering factor and is, 
conversely, largely independent of wavelength. 

A major difference between anomalous and coherent scat- 
tering is the phase change between incident and scattered 
X-rays. In the case of coherent scattering, this is 180° (Figure 
6.39) which is the reason that diffraction patterns are cen- 
trosymmetric. This means that for every Bragg reflection 
(hkl) there is an equally intense reflection (—h — k — l). 
This is formalized in Friedel’s law which equates the inten- 
sity (Z) of this pair of Bragg reflections sometimes referred 
to as Friedel mates: 

Ina = l-n- (6.24) 

By contrast, anomalous scattering results in the diffracted 
radiation from each Friedel mate having a different phase 
with the result that Friedel’s law does not hold for anoma- 
lous diffraction (Figure 6.52). This is due to the term f” in 
Equation (6.23) which affects the structure factor of each 
Friedel mate differently (Figure 6.53). 

Because anomalous diffraction is dependent on wave- 
length, we can define a wavelength-dependent structure 
factor * Fp. This receives contributions from both coher- 
ent and anomalously scattering atoms and thus is com- 
posed of two vector components, F),,;7 (coherent scatter- 
ing) and “F,,; a (anomalous scattering) each of which has a 
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Example 6.4 X-ray Crystallographic Determination of Structure of a Protein-RNA Complex 


Ribosomes are complex macromolecular structures which involve extensive interactions between a number of individual 
proteins and rRNA. A functionally critical part of rRNA which is heavily conserved consists of a 58-nucleotide sequence 
from 23S rRNA in E. coli which interacts structurally with a ribosomal protein L11. This is the site at which elongation 
factors EF-Tu and EF-G bind and it is a known target for antibiotics which confirms its functional importance. 

In order to know more about this part of the ribosome, a complex consisting of the RNA-binding domain (76 residues) 
of L11 with a rRNA fragment containing the 58 nucleotide fragment was crystallized and its structure determined by 
X-ray crystallography using anomalous diffraction to solve the phase problem. For this purpose selenomethionine was 
incorporated into L11 in place of methionine by growing E. coli on media containing selenomethionine. Equimolar 
amounts of rRNA and L11 were used in crystallization trials. 

Cubic crystals of both native and selenomethionyl complexes (dimensions; 0.1 x 0.1 x 0.2mm) were grown suc- 
cessfully by vapour diffusion with the sitting drop method (Section 6.3.4) in 50 mM sodium cacodylate (pH 6.5), 15% 
polyethylene glycol 600, 80 mM magnesium acetate, 100 mM potassium chloride and 0.2 mM Co(NH3)Cl; at 37 °C. The 
unit cell dimensions were a = b = 150.68 A, c = 63.84 A and the asymmetric unit contained two complexes. Diffraction 
data were collected from the synchrotron source at Brookhaven National Laboratory, Brookhaven, New York. 

The data collected is summarized in the table shown in (1) below. Note that a number of different X-ray wavelengths 
were used in collecting data from the selenomethiony] crystals. The number of unique reflections is always less than the 
total because of symmetry in the crystal. Approximately 70 000-080 000 individual reflections were collected in each 
data set of which approximately 10 500-517 500 were unique. 

Refinement produced an overall R factor of 24% for all reflections in the resolution range 8-2.8 A and a model was 
built into the electron density map which is shown in (2). 


(1) 
Data-sets Native Selenomethionyl L11-rRNA complex 
1 (A) 1.13951 0.97914 0.97871 0.96675 0.98557 
Range of resolution (A) 20-2.8 10-3.2 10-3.2 10-3.2 10-3.2 
Measurements 89 078 72.499 72879 72 186 72579 
Unique reflections 17542 10481 10480 10509 10437 
Completeness 94.5 88.3 88.3 88.7 88.2 
R 24% 


Data collected from diffraction of L11-rRNA complex crystals. 


(b) 


Loop 1 

Pro27 
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Ser24.NH Helix 3 
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Loop 2 


(a) Structure of L11-rRNA complex. (b) Schematic diagram showing the interactions between individual amino acid residues and individual bases 


(Continues)_! 
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M (Continued) 


The structure determined for the complex is consistent with previous studies using site-directed mutagenesis, sequence 
alignments between different species and NMR. It shows a number of points where individual amino acid residues contact 
the rRNA fragment and explains the stabilization effect of L11 on the 58 nucleotide fragment of rRNA. A rRNA hairpin 
bulges into the major groove of another rRNA helix and this interaction is locked in place by a direct contact of L11 with 
A1088. 

Reprinted with permission from Conn et al., Crystal Structure of a Conserved Ribosomal Protein-RNA Complex. 
Science 284, 1171-1174 (1999), American Association for the Advancement of Science 


Example 6.5 Determination of the Structure of a Binding Domain of the Human Editing Enzyme ADAR1 
Bound to DNA 


Almost three decades after the structure of right-handed double helix B-DNA was elucidated by Watson and Crick, a 
separate left-handed form of DNA called Z-DNA was discovered which was of unknown biological function. Unlike the 
smooth helix traced by the phosphates of B-DNA which form relatively shallow major and minor grooves, the phosphates 
of Z-DNA trace a jagged helix which forms a single deep groove extending to the axis of the helix. 

An enzyme called RNA adenosine deaminase (ADAR1) was found to bind to Z-DNA quite strongly and specifically 
by band-shift assays (Chapter 5). This binding activity was localized in a specific portion of the protein (the Zą domain). 
Z-DNA is formed transiently in the wake of a moving RNA polymerase and is thought to be stabilized by negative 
supercoiling. ADAR1 is thought therefore to target actively transcribing genes. In order to understand this biological 
function in greater detail, an investigation of the detailed structural interactions between this protein and DNA was carried 
out using X-ray diffraction. 

Cubic crystals of the Z, domain (residues 133-209 of ADAR1) and an oligonucleotide of sequence (TCGCGCG), 
were obtained with ammonium sulfate as precipitant by vapour diffusion using the hanging drop method. The cell 
dimensions were a = b = 85.9A,c = 71.3 À. Single isomorphous crystals were obtained by including 5-iodouracil for 
thymine-1 (iodo-1) and 5-iodocytosine for 6-iodocytosine (iodo-6). Data was collected on a rotating anode X-ray source 
and the data collection details are summarized in (1) below. 


(1) 
Native Todo-1 Iodo-6 
Resolution range (A) 40-2.4 20-2.1 20-2.8 
Number of unique reflections 10 896 29702 12 566 


Completeness 99.2 99.9 99.9 


The phase problem was solved by anomalous diffraction and refinement of the data-set for iodo-1 (which diffracted to 
higher resolution than the other two) produced an electron density map which was then used to build a model of the 
complex which was refined to an R value of 22.3%. An overview of this model is shown in (2) below. From this it is clear 
that two Z, domains bind to the double-stranded Z-DNA without contacting each other. 


(2) 


Overview of the complex formed between the Z, domain and Z-DNA. The a@-3 helix and part of the -hairpin are responsible for direct contact with 
Z-DNA 


(Continues)_! 
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M (Continued) 
The detail of the interactions between individual Z, domains and Z-DNA is shown in 3(a) below while 3(b) is a schematic 
of these interactions. 


(b) 


a3 


f-hairpin 


(a) Detail of protein-DNA interactions. (b) Schematic diagram of interactions 


The structure makes clear that the complex is stabilized by an extensive network of hydrogen bonds which are either 
directly between protein and DNA or else are mediated through water molecules. The structure is consistent with previous 
site-directed mutagenesis and NMR studies and the overall fold is generally similar to functionally related proteins. 

Reprinted with permission from Schwartz et al., Crystal Structure of the Za Domain of the Human Editing Enzyme 
Adari Bound to Left-handed Z-DNA. Science 284, 1841-1845 (1999), American Association for the Advancement of 
Science. 
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(b) Akl 


Figure 6.52. Friedel’s law. (a) Coherent scattering results in a 
path-difference of 2p for both the (hkl) and (—h — k — l) re- 
flections. These give rize to equally intense Bragg reflections 
(Inki = I-n—k-1) in accordance with Friedel’s law. (b) Anoma- 
lous scattering results in a path-difference of (2p + q) for the 
(hkl) relection and (2p — q) for the (—A — k — l) reflection. Thus 
Inki 4 l-n-k-1. Friedel’s law therefore does not hold for anomalous 
scattering. 


relative phase angle, a, 4; and ap4; a, respectively, as shown 
in Figure 6.54. œnxı is the phase of the data-set under in- 
vestigation which is needed to solve the phase problem and 
compute an electron density map. 

The objective of the anomalous scattering experiment 
is the determination of a value for a4). The wavelength- 
dependent, coherent and anomalous structure factors are 
related as 


` Farı =° Frat + YID +f" /A)° Faza (6.25) 


Figure 6.53. Non-equality of Friedel mates in anomalous diffrac- 
tion. The structure factors for (hkl) and (—h — k — 1) reflections 
in anomalous diffraction are shown as F* and F7, respectively. 
Fo is the diffraction component due to nonanomalously diffracting 
atoms. The contribution of each of the three components of the scat- 
tering factor shown in Equation (6.23) are shown as f, f’ and f”, 
respectively. Note that Fo, f and f’ of the Friedel pair are simply 
symmetrical reflections of each other and thus would obey Friedel’s 
law. The nonequality in intensity arises from differences in f”. 


o 


= Fina 


A 


Figure 6.54. Structure factors in anomalous diffraction. The struc- 
ture factor determined at any wavelength (* F) is composed of co- 
herent scattering C Fhkıt) and anomalous scattering € Freia) com- 
ponent structure factor. The phase angles associated with these are 
dnki and apxia, respectively. The objective of the experiment is to 
determine apxy. 
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With a single kind of anomalous diffractor introduced 
into the crystal, the relationship between the vector and 
phase angle terms at any wavelength, A, is 


a 3 
Fna = ° Faia? + a° Fhet A2 
+ bC Frey r Enki a COS Aang) 


oo i 
+ CC Fyrir Fhki a SiN Aan) 


(6.26) 


where a, b and c are wavelength-dependent coefficients as 
follows: 


a=Af/f 
b=2(f'/f) (6.27) 
c=2f"/f) 
and 
Af = (f? ae fP 
Adak = (Onki — Ahk A) (6.28) 


The anomalous scattering experiment consists of collect- 
ing diffraction patterns from a crystal containing atoms ca- 
pable of anomalous diffraction at a number of wavelengths. 
This allows estimation of the coefficients a, b and c since 
they consist of f’ and f” which can be experimentally 
measured and f which can be calculated. Experimental de- 
termination of f” involves using the relationship shown 
in Figure 6.53 and measuring the different intensities of 
pairs of Friedel mates. f’ may be estimated because of its 
wavelength-dependence by comparison of intensities at a 
chosen pair of wavelengths. This leaves three unknowns 
to be determined; °F,,,;7, the structure factor for coherent 
scattering, °F zia, the structure factor for anomalous scat- 
tering and the Aap x term (nx — nna). Equation (6.27) 
can be simultaneously solved for these three unknowns us- 
ing data collected at different wavelengths. The value for 
°F,kıa allows determination of the position of anomalously 
diffracting atoms in the crystal which, in turn, allows deter- 
mination of ajxj4. Since we know Aa;,;, we can therefore 
estimate of;,,). 


6.5.7 Calculation of Electron Density Map 


The intensity of each spot in the diffraction pattern generates 
a structure factor Fi, for each Bragg reflection (Equation 
(6.15)). The relative phase angle, a4, is estimated using 
the methods outlined in Sections 6.5.4—6.5.6. These data are 
then used as inputs for a computer program which includes a 
mathematical tool called the Fourier synthesis (Appendix 2) 
to analyse the waves of all the reflections. This converts the 
Bragg reflections from the lattice planes (hkl) to electron 
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Figure 6.55. Use of Fourier synthesis in calculation of electron 
density map. Three Bragg reflections are shown for equivalent 
atoms in lattice plane (hkl). Depending on the angle of incidence 
(8), an electron density wave perpendicular to the lattice plane is 
generated from each Bragg reflection. Fourier synthesis provides 
a summation of these electron density waves from many Bragg 
reflections which gives a one-dimensional electron density map 
(bottom). 


density waves which are perpendicular to (hkl) and gives 
a summation of them (Figure 6.55). The electron density 
at any point (x, y, z) in the unit cell may be expressed as 
p(xyz) where x — z are expressed as fractions of a — c: 


p(xyz) =| DD ae ys | Freer Akl e727ilhx+ky+z) 
(6.29) 


where i? = —1 and V is the unit cell volume. 

Note that this expression includes the amplitude, | Fpzıl, 
and phase, ax, of each Bragg reflection as well as three- 
dimensional positions of points in both crystal lattice and 
reciprocal space (hx + ky + lz). Moreover, this equation 
makes clear that all atoms in the unit cell contribute to 
each and every Bragg reflection (£). A simplified one- 
dimensional example of this is shown in Figure 6.55 but, 
for a biomacromolecule such as a protein, the electron den- 
sity map generated is three-dimensional (Figure 6.56). 

A consequence of Equation (6.29) is that every Bragg 
reflection in the data-set results from contributions from 
every atom in the unit cell. In fact, the structure factor F),,; 
corresponding to each Bragg reflection is a Fourier series 
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Figure 6.56. Portion of an electron density map of a protein. Amino acid side-chains (solid lines) are modelled through the electron density 
map of a portion of a protein. Distances due to noncovalent bonds such as hydrogen bonds can be traced as dashed lines. Note the presence 
of a molecule of structural water (top middle) involved in such a hydrogen bond. 


consisting of individual wave functions from each atom in 
the unit cell (Appendix 2). This means that there is an ‘all- 
or-nothing’ aspect to structure determination by X-ray crys- 
tallography since the electron density map for the whole 
molecule is calculated from the whole diffraction data-set. 
In practice, the Fourier synthesis may be terminated once a 
large number of the reflections in the limiting sphere have 
been included. 

The next step in structure determination is to model the 
structure of the biomolecule (e.g. protein primary structure) 
by tracing it through the electron density map to get as good a 
fit as possible. This involves matching atoms in the molecule 
to peaks in the electron density map. Difficulties can some- 
times arise in distinguishing structural water molecules from 
metal atoms or other parts of the biomacromolecule’s cova- 
lent structure such as amino acid side-chains (Section 6.1.1). 
In the case of proteins and nucleic acids, extensive knowl- 
edge of the normal range of bond lengths and bond angles 
has been built up and this is useful in the modelling process. 
The result is a preliminary model of the protein or DNA 
molecule in which the outline shape of the molecule and 


the overall arrangement of its covalent backbone should be 
visible. 


6.5.8 Refinement of Structure 


All the inputs to the calculation necessary to produce the 
electron density map are subject to some experimental error 
which consequently results in error in the map. Moreover, 
the level of detail visible in individual electron density maps 
(known as the structure resolution) may vary due to factors 
such as the number of reflections included and disorder in 
the crystal which results in ‘smearing’ of electron density. 
For biomacromolecules, a resolution in the range 2-3 A is 
usual in a high-resolution structure although some structures 
of better than 1.5 A resolution have been reported. The pre- 
liminary model is usually of significantly lower resolution, 
however. To arrive at a high-resolution structure we need to 
improve the model. This is achieved by cycling through a se- 
ries of computer programs which compare the model to the 
diffraction data until any discrepancy between the two has 
reached a minimum in a process called structure refinement. 
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Two types of information act as inputs to refinement. The 
first is empirical knowledge about the chemical components 
of the sample molecule. In the case of proteins, this includes 
known ranges of bond lengths and angles, dihedral angles, 
preferred arrangements around chiral centres, planarity of 
aromatic rings and knowledge of noncovalent interactions 
such as van der Waals, hydrogen and electrostatic bonds. For 
DNA molecules empirical factors would include base-pair 
hydrogen bonding, base planarity and sugar puckering. This 
empirical knowledge can be used in two distinct ways dur- 
ing refinement to limit the number of calculations necessary 
while maximizing the accuracy of the resulting structure. 
One is to introduce constraints in the refinement which lim- 
its the number of parameters which need to be included in 
each set of calculations. For example, a chemical group or 
structure such as a benzene ring can be refined as a sin- 
gle entity with fixed geometry rather than as six individual 
carbon atoms. This significantly reduces the number of pa- 
rameters required to compute a refined model location for 
this group. A second approach is to use restraints or limita- 
tions on the values that individual bond angles and lengths 
can have. A range of values centred on an average is used 
for each common bond angle and length which allows adop- 
tion of a nonideal value if this contributes to an improved 
structure. 

The second source of information is the preliminary 
model derived from the X-ray diffraction experiment. The 
refinement process can therefore be thought of as adjusting 
the model of the biomacromolecule within the database of 
constraints provided by empirical knowledge to minimize 
any discrepancies between the model and experimentally 
observed crystallographic data. The molecular structure is 
represented as a collection of energy functions and the re- 
finement process is continued until a global energy minimum 
is reached (Figure 6.57). 

In refinement the model is changed in each iteration by 
altering the position (x, y,z) and mobility of each atom 
and recalculating structure factors. The mobility of each in- 
dividual atom in the crystal affects its contribution to the 
overall diffraction pattern in that the more mobile an atom 
is, the less strongly it contributes to the electron density map. 
Some atoms in the structure of a protein may be quite rigid 
while others may be remarkably free to move. In fact, flexi- 
ble regions are essentially ‘invisible’ in an electron density 
map. A consequence of this is that we need to include in 
our refinement a weighting factor for each atom to quan- 
tify its mobility. This is called the temperature factor and 
is denoted by B. Although temperature is indeed a source 
of vibration and hence mobility of atoms, this parameter is 
really a measure of the total mobility of the atom result- 
ing either from thermal vibrations or disorder in the crystal 
and is also known by the less specific term displacement 
parameter. 
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Figure 6.57. Refinement of a protein structure. (a) Least squares 
refinement leads to a decrease in the R factor as the preliminary 
model is altered with changes to the parameters describing the 
position and mobility of each atom. If the preliminary model is 
substantially correct, this leads to a refined structure. However, 
the model can become trapped in a local energy minimum. (b) 
Simulated annealing provides for a greater range of conformational 
parameters which is useful if the preliminary model is somewhat 
different from the true structure. 


Because the effect of atomic vibration is to increase the 
apparent size of the atom, this leads to a decrease in the 
extent of the diffraction pattern which is a consequence of 
the reciprocal relationship between crystal lattice space and 
reciprocal space. This effect can be quantified as an expo- 
nential function of the scattering factor: 

f = foe? (sin? 6/A7) (6.30) 
where @ is the angle of incidence of X-rays of wavelength A 
and fo is the scattering factor of a stationary atom. In crys- 
tals of salts and minerals B factors are quite low (1-3 A?) 
but they are higher in organic crystals (2—6 A?) and crystals 
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of biomacromolecules (20-40 A). In protein crystallogra- 
phy the variation of the temperature factor as a function of 
primary structure is frequently presented as a plot which can 
be useful in distinguishing flexible regions from more rigid 
parts of the structure. 

A measure of the progress of the refinement process is 
provided by the R factor which was defined in Section 6.5.1 
(Equation (6.17)). This represents a summation of the differ- 
ence between observed structure factors and those calculated 
from the model. However, the R factor is not an indication 
of the precision of the structure in that two models with the 
same R factor may have bond angles and lengths determined 
with quite different precision. 

In situations where the preliminary model is close to 
the true structure, the comparison may be made using a 
least squares procedure. This minimizes the sum of the 
squares of deviation of | Fix caic| from | Fhzi obs| for the whole 
structure. 

However if the model, though largely correct, is incorrect 
in parts, the least squares procedure can become ‘trapped’ 
in a local energy minimum (Figure 6.57). This is because the 
computer algorithms which carry out the least squares re- 
finement generally only make small alterations to the model 
on each iteration. To rescue the model from a local energy 
minimum may require extensive revision of the model. In 
such situations a procedure called simulated annealing is 
used. This simulates the effect of heating the molecule to 
give each atom much greater freedom of movement. The net 
effect of this is that the model can explore a much greater 
range of possible conformations and thus arrive at a global 
energy minimum. 

The range of R factors estimated for biomacromolecules 
are significantly larger (12-20%) than are obtained in small 
molecule crystallography (3-5%). A major reason for this is 
the difficulty of modelling the relatively high water content 
of protein and DNA crystals (which can be up to 50%). One 
successful approach to this problem is to model water in 
layers approaching the biomacromolecule with the solvation 
shell having the highest order and lowest mobility. 


6.5.9 Synchrotron Sources 


If electrons are accelerated to near the speed of light in aring- 
shaped particle accelerator called an electron storage ring 
and then quickly decelerated by forcing them into a curved 
trajectory, they emit their excess radiation in the form of a 
powerful stream of X-rays tangential to the ring which is 
called synchrotron radiation (Figure 6.58). The particle ac- 
celerators necessary to perform this function are large, ex- 
tremely expensive and are located in a small number of large- 
scale facilities internationally such as at Brookhaven (USA), 
Grenoble (France), Harwell (England), Tsukuba (Japan), 
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Figure 6.58. A synchrotron X-ray source. If electrons which have 
been accelerated in the electron storage ring are suddenly decel- 
erated, they emit a powerful beam of X-rays across a range of 
wavelengths. These are directed through a collimator which, to- 
gether with metal mirrors, produces a fine intense beam which is 
directed through the crystal for diffraction. Note the size of the 
electron storage ring which is similar to that of a football field. 


Trieste (Italy) and Hamburg (Germany). A web site con- 
taining links to the main synchrotron sources internationally 
is given in the Bibliography. Crystallographers can arrange 
“beam time’ to collect data-sets in these large-scale facilities 
and most of the high-resolution structures published in the 
scientific literature are determined from data collected in 
this way. 

The beam of X-rays generated from a synchrotron source 
has a number of characteristics which make it extremely 
useful in diffraction experiments. Compared to the beam of 
X-rays which can be generated from the laboratory sources 
described in Section 6.4.1, the beam from a synchrotron 
has some thousand-fold greater energy and is more highly 
collimated which means that the diffraction spots mea- 
sured are both smaller and more intense, thus enhancing 
the resolution and precision of the diffraction pattern. It also 
means that data-sets consisting of thousands of Bragg reflec- 
tions can be collected in minutes rather than the hours re- 
quired with laboratory sources. The synchrotron beam is ex- 
tremely narrow and intense allowing collection of data from 
much smaller crystals than can be used with a laboratory- 
scale instrument. Moreover, the beam is tunable and can 
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generate a wider range of wavelengths than laboratory-scale 
sources which may be used in multiple wavelength anoma- 
lous diffraction experiments for phase angle calculations as 
we have previously seen in Section 6.5.6. 


6.6 OTHER DIFFRACTION METHODS 


X-ray diffraction is the main means currently available 
for high-resolution structure determination of biomacro- 
molecules. However, other types of diffraction are also 
possible which have particular applications likely to be im- 
portant in the future. The principles involved in these exper- 
iments are generally similar to X-ray diffraction although 
each has particular attributes and limitations which we will 
now briefly describe. 


6.6.1 Neutron Diffraction 


Just as electrons diffract X-rays, so a beam of neutrons can 
be diffracted by the nucleus of the atom. This diffraction 
also results in a 180° change of phase in the diffracted beam 
but, because nuclei have a fixed location in space compared 
to the relatively diffuse electron cloud, greater structural 
detail can be observed than in X-ray diffraction. Neutron 
diffraction studies can be complementary to X-ray diffrac- 
tion in many ways as a result. It provides a means to locate 
the position of light atoms such as hydrogen which are not 
visible in electron density maps and also allows us to distin- 
guish individual isotopes from each other. Moreover, since 
electron distribution around atoms is not necessarily sym- 
metric, neutron diffraction can give more accurate estimates 
for bond lengths than those apparent in the electron den- 
sity map and, for example, can help locate hydrogen bonds 
more precisely than X-ray diffraction. Neutron diffraction 
can also detect thermal vibration in crystals which is a major 
source of disorder and hence scatter in the electron density 
map. Major constraints on the widespread use of this tech- 
nique, however, are the need for much larger crystals than 
required for X-ray or electron diffraction (~1 mm?) and the 
requirement of time at a nuclear reactor to access a neutron 
beam. 


6.6.2 Electron Diffraction 


Because the availability of three-dimensional protein crys- 
tals is a prerequisite for structure determination of proteins 
by X-ray diffraction, membrane-bound proteins are not di- 
rectly accessible to the technique. Notwithstanding the fact 
that approximately 30% of eukaryotic genes encode integral 
membrane-bound proteins, the high-resolution structures of 
only a handful have been determined to date compared with 
tens of thousands nonmembrane protein structures. Multi- 


dimensional NMR is also limited as a means of structure 
determination because of the large weight of detergent in 
detergent-solubilized membrane-bound protein combined 
with the M, limit of this technique (Section 6.2.8). This 
means that most of our knowledge of high-resolution pro- 
tein structure comes from water-soluble proteins. 

Electron diffraction provides an alternative means to 
study the structure of membrane proteins. It takes advan- 
tage of the fact that, while membrane proteins do not read- 
ily form three-dimensional crystals, they sometimes will 
crystallize in two dimensions (Example 6.6). While two- 
dimensional crystals cannot be mounted in an X-ray beam 
they can diffract electrons systematically and, when com- 
bined with suitable image analysis, can be used to determine 
three-dimensional structure. 

In a few instances two-dimensional crystals occur natu- 
rally such as those formed by bacteriorhodopsin of Halobac- 
terium halobium. However, in most cases it is necessary to 
screen a variety of conditions to grow crystals. Usually this 
involves exposing detergent-solubilized protein to a vari- 
ety of lipids until suitable conditions are identified or, al- 
ternatively, growing the crystals on lipid monolayers. The 
crystals are then mounted on a grid and exposed to a beam 
of electrons in an electron microscope at very low temper- 
ature and at low magnification. These conditions are nec- 
essary because too strong an electron beam can damage 
the structure of biological molecules. The diffraction image 
is then collected by a microdensitometer and analysed by 
Fourier synthesis (Appendix 2). Because of the limitations 
imposed by too weak an electron beam and imperfections 
in the crystal, the resolution possible with this technique is 
currently in the region of 7 A which is significantly lower 
than that possible with X-ray diffraction. However, since 
membrane-bound proteins tend to be large macromolecular 
complexes, this level of structural information can be highly 
informative about overall shape, dimensions and organiza- 
tion. More than 2000 structures have been determined in 
this way. 

A related method is single particle cryo-electron mi- 
croscopy (cryo-EM). This analyses electron micrographs 
from single particles of large structures such as ribosomes, 
viruses, ion channels and chaperonin complexes which do 
not readily crystallize. In excess of 100 such structures are 
presently known. The overall resolution is ~4.5-12 A which 
allows the main features to be visualized. Where higher- 
resolution structures at the level of domain or protein com- 
ponents are available from X-ray diffraction or NMR, these 
can be combined with cryo-EM to generate a high-resolution 
picture of the particle. An advantage of cryo-EM is that it 
makes possible viewing of large biological machines in a 
variety of functional states, for example by collecting time- 
lapse images. A disadvantage is that the beam of electrons 
used can damage the protein as mentioned above. 
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Example 6.6 Electron Diffraction by Two-dimensional Crystals of Aquaporin 1 


Rapid water transport across plasma membranes is facilitated by specialised water channels which transport water 
selectively to the exclusion of small ions and solutes. Aquaporin 1 (also called CHIP28) is a channel-forming integral 
glycoprotein of 28 kDa molecular mass which is expressed in erythrocytes and in cells of water-transporting tissues. 

In order to study the structure and organization of aquaporin 1, two-dimensional crystals were obtained for electron 
diffraction studies. The crystals (dimensions 1.5-2.5 um diameter) were obtained by reconstituting 2—4 mg/ml solubilized 
deglycosylated protein into synthetic lipid bilayers (dioleyl phosphatidyl choline) by dialysis in a lipid:protein ratio of 
1:1 to 1:3 (Chapter 7). The crystals were attached to an electron microscope grid stained with uranyl formate and dried. 
Electron diffraction was then performed and structure factors calculated. Five individual diffraction images containing 
reflections up to 12 A were aligned to a common phase origin and these phases and amplitudes were merged and used to 
reconstruct a projection density map at 12 A resolution of aquaporin 1 shown below. 


Averaged projection density map of aquaporin 1 


The unit cell (light dashed line) contained eight individual aquaporin molecules arranged in four dimers and with 
dimensions of a = b = 99.2 A. The tetramer (heavy dashed line) contains monomers orientated in the same direction 
and each monomer is in contact with a neighbouring tetramer (e.g. 1 with 1’). This structure is consistent with electron 
micrographs of freeze-fractured samples and allowed definition of the molecular boundaries and packing of aquaporin 
monomers. However, secondary structure features are not visible at this level of resolution and higher-resolution data 
was necessary for this. In later publications, structural data to a resolution of 6 A was obtained by electron diffraction 
which allowed a more detailed model to be constructed of secondary structure in the trans-membrane domains. 


Reproduced from Mitra et al. (1994) Projection structure of the CHIP28 water channel in lipid bilayer membranes at 12 A resolution. Biochemistry, 33, 
1273540, with permission of the American Chemical Society. 


6.7 COMPARISON OF X-RAY 
CRYSTALLOGRAPHY WITH 
MULTI-DIMENSIONAL NMR 


ture. Having described each of these techniques in some 
detail, we will now briefly look at the relationship between 
them. 


X-ray crystallography and multi-dimensional NMR are now 


the main experimental approaches to determination of three- 
dimensional structure of biomacromolecules. These tech- 
niques are independent of each other in both preparation of 
sample and the physical parameter used to calculate struc- 


6.7.1 Crystallography and NMR are Complementary 
Techniques 


In terms of sample accessibility and of the physical basis for 
structure determination, NMR and X-ray crystallography 
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Table 6.4. Comparison of points where NMR and X-ray crystal- 
lography are complementary approaches to structure determination. 


Point of Multi-dimensional X-ray 

comparison NMR crystallography 

Phase of sample Liquid Solid 

Component of Nucleus Electrons 
atom involved 

Mobility 

Rigid parts of Weak signal Strong signal 
structure 

Mobile parts of Strong signal Weak signal 
structure 

Time-resolution Dynamic Fixed 

Final structure Family of related Single refined 

structures structure 


are largely complementary techniques (Table 6.4). NMR 
requires sample molecules capable of existing at high con- 
centration under acidic pH conditions and therefore uses 
highly water-soluble samples present in the liquid phase. 
X-ray crystallography, by contrast, requires samples ca- 
pable of forming large single crystals in the solid phase. 
Where proteins have had their structure determined by both 
methods, identical structures have usually been found al- 
though the reliability of NMR-derived structures decreases 
markedly with increasing sample mass. This fact, together 
with the knowledge that many proteins (e.g. enzymes) re- 
tain their biological activity in the crystalline state and 
that the same protein structure is arrived at from differ- 
ent crystal-forms, gives us confidence that the structural 
form of biomacromolecular samples present in crystals is 
biologically meaningful and is not merely an artefact of 
crystallization. 

The two techniques ‘report’ on different parts of atomic 
structure since NMR is a phenomenon which depends on 
nuclei while X-ray diffraction is due to electrons. In NMR 
the location of protons which are largely invisible in elec- 
tron density maps is identified. Moreover, lack of mobil- 
ity in species responsible for an NMR signal frequently 
leads to a broadening of the signal similar to that ob- 
served for ESR signals (Section 3.8.3). This means that 
more flexible parts of structure produce relatively stronger 
NMR signals which is exactly opposite to the situation 
in X-ray crystallography where flexible parts of structure 
are much less visible than static parts. The two techniques 
therefore provide complementary information on structure 
which gives a much fuller picture than either method on 
its own. 


Time-resolved experiments in which dynamic processes 
such as protein tumbling, protein folding, domain movement 
and ligand binding can be followed as a function of time are 
particularly appropriate to multi-dimensional NMR. Move- 
ment in crystals can result either from static disorder in 
the crystal (i.e. an equivalent atom occupying a range of 
locations in different molecules) or from dynamic move- 
ment (i.e. movement of the atom during the time-frame of 
the diffraction experiment). This may be detected by low 
electron density in the electron density map. In principle, 
rapid data collection and analysis using synchrotron sources 
can also provide time-resolved information from crystals. 
However, in most cases, X-ray crystallography provides a 
single static structure because intermolecular interactions 
necessary for crystallization may severely limit the mobil- 
ity of groups on the surface of the biomacromolecule. This 
is an artificial situation since, in aqueous solution, surface 
groups are far more mobile than groups buried in the inte- 
rior of the protein. NMR is therefore more generally use- 
ful for providing a ‘moving image’ of structure while the 
image provided by X-ray diffraction is closer to a single 
‘snapshot’. 

Notwithstanding these points where the techniques are 
complementary it should always be borne in mind that 
only certain samples are suitable for both multi-dimensional 
NMR and X-ray diffraction. In the former case, samples 
must be generally small and highly soluble at acid pH 
without forming aggregates while, in the latter case, sam- 
ples must crystallize readily. However, many proteins are 
not readily amenable to study by either technique (e.g. 
membrane-bound proteins) while some may be studied by 
only one of them. This sample nonaccessibility is a major 
bottleneck in biomacromolecule structure determination in 
biochemistry. 


6.7.2 Different Attributes of Crystallography- 
and NMR-derived Structures 


Even though high-resolution structures result from both 
techniques, there are some important points of detail 
in which the structures differ from each other. Multi- 
dimensional NMR provides a ‘family’ of closely related 
structures which are all equally possible statistically. This 
contrasts with X-ray crystallography which, after an exten- 
sive process of statistical analysis, results in a single refined 
structure. Moreover, within an individual structure arrived at 
by multi-dimensional NMR, there is no indication of the reli- 
ability within the structure. By contrast, X-ray crystallogra- 
phy has robust and well-established methods for assessment 
of the reliability of each individual part of the structure. 
This means that, while the outline of the polypeptide chain 
may be clear in such structures, care may need to be taken 
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in drawing detailed conclusions on atomic locations from 
multi-dimensional NMR. 


6.8 STRUCTURAL DATABASES 


Structural information is normally disseminated to the in- 
ternational scientific community by means of published pa- 
pers in learned journals such as the Journal of Molecular 
Biology, Acta Crystallographica and Structure which are 
accessible in most university and institutional libraries and 
on-line. Such papers give detailed information on the origin 
and preparation of the sample, the method used to deter- 
mine the structure and especially interesting aspects of the 
structure such as its biological significance. However, struc- 
tures are also included in a structural database which is 
accessible through the worldwide web. The great advantage 
of this is that, with the aid of a suitable computer graphics 
program, the structure may be viewed in three dimensions 
rather than the two dimensions possible on paper thus giv- 
ing a more realistic impression of the structure and allowing 
focus on a part of the structure of specific interest. Many se- 
quence databases exist which allow searching, comparison 
and downloading of primary structure data (Chapter 9). The 
main repository of three-dimensional structural information, 
however, is the protein database (PDB). Because the reader 
may wish to view particular structures of biomolecules we 
will now briefly describe the PDB and how structures may 
be downloaded from it and viewed. 


6.8.1 The Protein Database 


Entries in the protein database arise from four main ex- 
perimental techniques; X-ray diffraction, multi-dimensional 
NMR, electron diffraction and cryo-EM. At the time of writ- 
ing (Dec. 2008) there are almost 50000 structures in the 
database of which some 84% are structures derived from 


Table 6.5. Total entries in the Protein Database (Dec. 2008). 


Technique 
Type of sample X-ray Diffraction NMR  Cry-EM 
Proteins, peptides 43520 6624 142 
and viruses 
Protein/nucleic 2011 140 50 
acid complexes 
Nucleic acids 1101 827 12 
Carbohydrates 24 7 0 
and others 
Total 46 656 7598 204 


X-ray diffraction (Table 6.5). From this table it is also clear 
that, in addition to protein structures, the database contains 
extensive entries for viruses, nucleic acids, carbohydrates 
and macromolecular complexes. Some 80% of the protein 
structures include closely related groups of structures such 
as mutants, protein families and individual proteins (e.g. 
there are over 1043 individual entries for lysozyme) which 
means that only some 20% of the structures are responsible 
for the range of some 50 000 protein structures covered. 

Since July 1999 the database has been managed by a 
consortium consisting of the National Institute of Standards 
and Technology (NIST, MD, USA) and the Universities of 
Rutgers (NJ, USA) and California (San Diego, CA, USA) 
called the Research Collaboratory for Structural Bioinfor- 
matics. Prior to this, it was operated by the Brookhaven 
National Laboratory and was previously referred to as the 
Brookhaven structural database. 


6.8.2 Finding a Protein Structure in the Database 


A tutorial program is provided to allow users to familiarize 
themselves with routine operations available. To visualize a 
structure from the database it is necessary to download the 
three-dimensional coordinates for the protein of interest. 
These may be located in three ways. Each structure is 
designated with a specific identification code called a 
PDB ID code (e.g. a 1.9A structure for glucose oxidase 
is denoted; 1GAL). Alternatively, a keyword option allows 
searching for the protein by name (and/or species of origin) 
while an author option allows searching by author(s) of 
peer-reviewed publications. A list of structures is generated 
from which the desired structure is selected. The coordi- 
nates are retrieved from the database and may be saved in a 
file on the personal computer (or attached peripheral) in use. 
To visualize the structure, it is necessary to use a suitable 
graphics program such as Rasmol which is available free 
on the internet. This allows presentation of the structure in 
various orientations and from various angles. The structure 
may be represented as wireframe, space-filling or ribbon 
models (Figure 6.59). Moreover, specific parts of the 
structure (e.g. from residue 10 to 30) can be highlighted 
in different colours or in a different representation as can 
nonprotein ligands and structural water. 

The PDB provides a major resource to the scientific com- 
munity which makes the hard-won structural data gained by 
the successors to Perutz, Kendrew and their colleagues in- 
creasingly readily accessible. Because of improvements in 
methods for generating and solving three-dimensional struc- 
tures of biomacromolecules and for disseminating such data 
electronically, there has been a perceptible acceleration in 
the pace at which the structural versatility and beauty of 
biomacromolecules is being revealed. 
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(a) (b) 


(d) 


Figure 6.59. Representations of glutathione transferase 1-1 by RASMOL. The coordinates for GST 1-1 were downloaded from the protein 
database and used to construct the images shown. One subunit is shown in black while the other is in grey. The various representations are 
(a) Wireframe, (b) backbone, (c) space-filling, (d) ribbon. (e) Single tryptophans (Trp-20) are represented by space-filling. (f) The ligand 
ethacrynic acid with which the protein was co-crystallized is represented by space-filling. 


THREE-DIMENSIONAL STRUCTURE DETERMINATION OF MACROMOLECULES 259 


REFERENCES 


General reading 


Branden, C. and Tooze, J. (1999) Introduction to Protein Structure, 
2nd edn, Garland Press, New York. A very readable overview of 
protein structure and methods used in its analysis. 

Campbell, I.D. (2002) The march of structural biology. Nature 
Reviews Molecular Cell Biology, 3, 377-81. A review of the 
development of structural biology. 

Fersht, A. (1999) Structure and Mechanism in Protein Science: A 
Guide to Enzyme Catalysis and Protein Folding, Freeman, New 
York. An overview of the relationship between protein struc- 
ture and function with a particularly good description of protein 
folding. 

Kleywegt, G.J., Henrick, K., Dodson, E.J. and van Aalten, D.M.F. 
(2003) Pound-wise but penny-foolish: How well do micro- 
molecules fare in macromolecular refinement? Structure, 11, 
1051-9. A study of how structures of small molecules are of- 
ten less reliable than those of macromolecules during structural 
refinement. 

Lesk, A. (2004) Introduction to Protein Science: Architecture, 
Function and Genomics, Oxford University Press, Oxford, UK. 
An excellent description of protein architecture with special em- 
phasis on information flow in Biology and on protein evolution. 

Rhodes, G. (2006). Crystallography Made Crystal Clear, 3rd edn, 
Academic Press, New York. A brief but extremely well-written 
explanation of the basics of X-ray crystallography of biomacro- 
molecules and how to read crystallography papers critically. 


The protein folding problem 


Baldwin, R.L. and Rose, G.D. (1999). Is protein folding hierarchic? 
I Local structure and peptide folding. Trends in Biochemical 
Sciences, 24, 26-33. A review of class I and II protein folding 
featuring simulations and experimental evidence from peptides 
and small proteins. 

Baldwin, R.L. and Rose, G.D. (1999). Is protein folding hierar- 
chic? II Folding intermediates and transition states. Trends in 
Biochemical Sciences, 24, 77-83. A review of hierarchical pro- 
tein folding focusing on folding intermediates in class I proteins 
and transition states in class II. 

Chiti, F. and Dobson, C.M. (2006) Protein misfolding, functional 
amyloid, and human disease. Annual Review of Biochemistry, 
75, 333-66. A detailed review on the role of protein folding in 
human disease. 

Cordes, M.H.J., Walsh, N.P., McKnight, C.J. and Sauer, R.T. (1999) 
Evolution of a protein fold in vitro. Science, 284, 325-7. An 
unusual example of disruption of overall protein fold by a small 
mutation. 

Greenfield, N.J. (2006) Determination of the folding of proteins 
as a function of denaturants, osmolytes or ligands using circu- 
lar dichroism. Nature Protocols, 1, 2733-41. A review of the 
application of CD to the study of protein folding. 

Mattos, C. and Clark, A.C. (2008) Minimizing frustration by fold- 
ing in a crowded environment. Archives of Biochemistry and 
Biophysics, 469, 118-31. A study of the role of water in protein 
folding. 


Munoz, V. and Eaton, W.A. (1999) A simple model for calculating 
the kinetics of protein folding from three-dimensional structures. 
Proceeding National Academy Sciences USA, 96, 11311-6. A 
simple model for calculating rates of 2-state folding from three- 
dimensional structure alone is described. 

Royer, C.A. (2008) The nature of the transition state ensemble 
and the mechanisms of protein folding: A review. Archives of 
Biochemistry and Biophysics, 469, 34—45. A review on transition 
states in protein folding. 

Schroder, M. and Kaufman, R.J. (2005) The mammalian unfolded 
protein response. Annual Review of Biochemistry, 74, 739-89. A 
review of links between protein mis-folding in the ER and signal 
transduction with the unfolded protein response. 

Murphy, R.M. and Tsai, A.M. (eds) (2006) Misbehaving Proteins: 
Protein (Mis)folding, Aggregation and Stability, Springer, Berlin. 
A description of protein aggregation including computational and 
experimental methods and implications for engineered proteins. 

Silva, J.L., Cordeiro, Y. and Foguel, D. (2006) Protein folding and 
aggregation: Two sides of the same coin in the condensation of 
proteins revealed by pressure studies. Biochimica et Biophys- 
ica Acta — Proteins and Proteomics, 1764, 443-51. A study of 
protein mis-folding under high pressure conditions leading to 
stabilization of folding intermediates. 

Soto, C. (2003) Unfolding the role of protein misfolding in neurode- 
generative diseases. Nature Reviews Neuroscience, 4, 49-60. A 
review of protein mis-folding in neurodegenerative diseases. 

Stanley, A.M. and Fleming, K.G. (2008) The process of folding 
proteins into membranes: Challenges and progress. Archives of 
Biochemistry and Biophysics, 469, 46-66. A review of folding 
of membrane proteins. 

Thomas, P.J., Qu, B.-H. and Pedersen, P.L. (1995). Protein folding 
as a basis of human disease. Trends in Biochemical Sciences, 20, 
456-9. A review of clinical conditions possibly arising as a result 
of incorrect protein folding. 


Chaperonins 


Corrales, F.J. and Fersht, A.R. (1996) Towards a mechanism for 
GroEL-GroES chaperone activity: An ATPase-gated and -pulsed 
folding and annealing cage. Proceedings National Academy Sci- 
ences USA, 93, 4509-12. A description of a proposed ‘gating’ 
mechanism allowing coupling of ATP-hydrolysis to folding of 
slow-folding proteins. 

Lin, Z. and Rye, H.S. (2006) GroELmediated protein folding: Mak- 
ing the impossible possible. Critical Reviews in Biochemistry and 
Molecular Biology, 41, 211-39. A detailed review of GroEL- 
mediated folding. 

Macario, A.J.L. and de Macario, E.C. (2007) Molecular chaperones: 
Multiple functions, pathologies, and potential applications. Fron- 
tiers in Bioscience, 12, 2588-600. Clinical relevance of chaper- 
ones. 

Young, J.C., Barral, J.M. and Hartl, F.U. (2003) More than folding: 
Localized functions of cytosolic chaperones. Chaperone-assisted 
protein folding in the cytosol. Trends in Biochemical Sciences, 
28, 541-7. An excellent review of cytosolic protein folding. 

Richardson, A., Landry, S.J. and Georgopoulos, C. (1998) The 
ins and outs of a molecular chaperone machine. Trends in 


260 PHYSICAL BIOCHEMISTRY 


Biochemical Sciences, 23, 138-43. A review of structure— 
function relationship in GroEL. 

Shtilerman, M., Lorimer, G.H. and Englander, S.W. (1999). Chaper- 
onin function: folding by forced unfolding. Science, 284, 822-5. 

Xu, Z.H., Horwich, A.L. and Sigler, P.B. (1997). The crystal 
structure of the asymmetric GroEL-GroES-(ADP)(7) chaper- 
onin complex. Nature, 388, 741-50. The crystal structure of the 
GroEL/GroES system. 


Multi-dimensional NMR 


Bonvin, A.M.J.J., Boelens, R. and Kapstein, R. (2005) NMR anal- 
ysis of protein interactions. Current Opinion in Chemical Biol- 
ogy, 9, 501-8. A review of how larger Mr samples, including 
complexes, are becoming amenable to multi-dimensional NMR 
investigation. 

Cavanagh, J., Fairbrother, W.J., Palmer, A.J. et al. (2007). Protein 
NMR Spectroscopy: Principles and Practice, 2nd edn, Elsevier. A 
detailed description of the physical basis of biomolecular multi- 
dimensional NMR. 

Glockshuber, R., Hornemann, S., Riek, R., Wider, G., Billeter, M. 
and Wuthrich, K. (1997) Three-dimensional NMR structure of a 
self-folding domain of the prion protein PrP(121—231). Trends 
in Biochemical Sciences, 22, 241-2. An example of structure 
determination by NMR. 

Ishima, R. and Torchia, D.A. (2000) Protein dynamics from NMR. 
Nature Structural Biology, 7, 740-3. A review of use of NMR in 
study of protein dynamics. 

Krishna, M.M.G., Hoang, L., Lin, Y. and Englander, S.W. (2004) 
Hydrogen exchange methods to study protein folding. Methods, 
34, 51-64. A description of hydrogen exchange methods for 
study of protein folding. 

Mittermaier, A. and Kay, L.E. (2006) Review — New tools provide 
new insights in NMR studies of protein dynamics. Science, 312, 
224-8. A review of recent method improvements for study of 
protein dynamics by NMR. 

McDermott, A. and Polenova, T. (2007) Solid-state NMR: New 
tools for insight into enzyme function and protein dynamics. 
Current Opinion in Structural Biology, 17, 617-22. A review of 
solid state NMR in the study of enzymes. 

Neuhaus, D. and Williamson, M. (1989) The Nuclear Overhauser 
Effect in Structural and Conformational Analysis, VCH Publish- 
ers, New York. A comprehensive text on the NOE. 

Rattle, H. (1995). An NMR Primer for Life Scientists, Partnership 
Press. An introduction to NMR for life scientists. 

Sanders, J.K.M. and Hunter, B.K. (1993). Modern NMR Spec- 
troscopy: A Guide for Chemists, 2nd edn, Oxford University 
Press, Oxford, UK. A non-mathematical introduction to multi- 
dimensional NMR. 

Tugarinov, V., Kanelis, K. and Kay, L.E. (2006) Isotope labeling 
strategies for the study of high-molecular-weight proteins by 
solution NMR spectroscopy. Nature Protocols, 1, 749-54. A 
review of use of isotope labeling to allow solution NMR of high 
Mr samples. 

Wu, Z.R., Ebrahimian, S., Zawrotny, M.E. et al. (1997). Solution 
structure of 3-oxo-A>-steroid isomerase. Science, 276, 415-8. 
A good example of structure determination by heteronuclear, 
multi-dimensional NMR. 


X-ray crystallography 


Chayen, N.E. and Saridakis, E. (2008) Protein crystallization: From 
purified protein to diffraction-quality crystal. Nature Methods, 5, 
147-53. A review of protein crystallization. 

Conn, G.L., Draper, D.E., Lattman, E.E. and Gittis, A.S. (1999). 
Crystal structure of a conserved ribosomal protein-RNA com- 
plex. Science, 284, 1171-4. A good example of structure deter- 
mination of a protein-RNA macromolecular complex. 

Darcy, P.A. and Wiencek, J.M. (1999). Identifying nucleation tem- 
peratures for lysozyme via differential scanning calorimetry. 
Journal of Crystal Growth, 196, 243-9. Differential scanning 
calorimetry (Section 8.3) can be combined with light scattering 
to identify a temperature range within which crystal nucleation 
is maximally possible. 

Dauter, Z. (2006) Current state and prospects of macromolecular 
crystallography. Acta Crystallographica D — Biological Crystal- 
lography, 62, 1-11. A comprehensive review of the current state 
of macromolecular crystallography. 

Dodson, E. (2008) The befores and afters of molecular replace- 
ment. Acta Crystallographica D — Biological Crystallography, 
64, 17-24. A review of molecular replacement. 

Glusker, J.P., Lewis, M. and Rossi, M. (1994). Crystal Structure 
Analysis for Chemists and Biologists, VCH, New York. A su- 
perbly readable, comprehensive and accessible description of 
X-ray crystallography. 

Karplus, P.A. and Faerman, C. (1994) Ordered water in macro- 
molecular structure. Current Opinion in Structural Biology, 4, 
770-6. A mini-review of structural water at protein surfaces. 

Ke, H.M. (1997). Overview of isomorphous replacement phasing. 
Methods in Enzymology, 276, 448-61. A review of isomorphous 
replacement. 

McPherson, A. (1990). Current approaches to macromolecular crys- 
tallisation. European Journal of Biochemistry, 189, 1-23. A re- 
view of the process of protein crystallisation. 

McPherson, A. (2004) Introduction to protein crystallization. Meth- 
ods, 34, 254-65. An excellent introduction to crystallization of 
proteins. 

McPherson, A. and Cudney, B. (2006) Searching for the silver bul- 
lets: An alternative strategy for crystallizing macromolecules. 
Journal of Structural Biology, 156, 387-406. A fascinating de- 
velopment in increasing the success of crystallization screens. 

Messerschmidt, A. (2007) X-ray Crystallography of Biomacro- 
molecules. Wiley-VCH Verlag GmbH, Weinheim. A compre- 
hensive description of current practice in X-ray diffraction. 

Qian, B., Raman, S., Das, R. et al. (2007) High-resolution struc- 
ture prediction and the crystallographic phase problem. Na- 
ture, 450, 259-64. A new modelling method which offers im- 
proved solution to the phase problem including de novo structure 
prediction. 

Schwartz, T., Rould, M.A., Lowenhaupt, K. et al. (1999) Crys- 
tal structure of the Za domain of the human editing enzyme 
ADARI bound to left-handed Z-DNA. Science, 284, 1841-5. 
A good example of structure determination of a DNA-protein 
macromolecular complex. 

Stewart, I. (1997) Crystallography of a golf ball. Scientific Ameri- 
can, 276, 96-8. An entertaining description of the mathematics 
of symmetry taking golf balls as a concrete example! 


THREE-DIMENSIONAL STRUCTURE DETERMINATION OF MACROMOLECULES 261 


Vrielink, A. and Sampson, N. (2003) Sub-Angstrom resolution en- 
zyme X-ray structures: Is seeing believing? Current Opinion 
in Structural Biology, 13, 709-15. A review emphasizing how 
high-resolution structural data contribute to our understanding of 
enzyme mechanism. 


Synchrotron X-ray sources 


Ealick, S.E. and Walter, R.L. (1993). Synchrotron beamlines for 
macromolecular crystallography. Current Opinions in Structural 
Biology, 3, 725-36. A review of the main synchrotron sources 
available internationally. 

Hendrickson, W.A. and Ogata, C.M. (1997) Phase determina- 
tion from multiwavelength anomalous diffraction measurements. 
Methods in Enzymology, 276, 494-523. Review of anomalous 
diffraction in structure determination. 

Hendrickson, W.A. (2000) Synchrotron crystallography. Trends in 
Biochemical Sciences, 25, 637-43. A review of synchrotron ra- 
diation and its uses in structural biology. 


Neutron diffraction 


Fitter, J.J., Gutberlet, T. and Katsaras, J. (eds) (2006) Neutron Scat- 
tering in Biology:Techniques and Applications, Springer, Berlin. 
A collection of applications of neutron diffraction to biological 
problems including crystallography. 

Niimura, N. and Bau, R. (2007) Neutron protein crystallography: 
Beyond the folding structure of biological macromolecules. Acta 
Crystallographica A, 64, 12-22. An excellent overview of the 
development and current status of neutron diffraction. 


Electron diffraction 


Lacapere, J.-J., Pebay-Peyroula, E., Neumann, J.-M. and Etchebest, 
C. (2007) Determining membrane protein structures: Still a chal- 
lenge! Trends in Biochemical Sciences, 32, 259-70. A review of 
structure determination of membrane proteins. 


Mitra, A.K., Yeager, M., Vanhoek, A.N. et al. (1994) Projection 
structure of the CHIP28 water channel in lipid bilayer membranes 
at 12 A resolution. Biochemistry, 33, 1273540. A good example 
of the application of electron diffraction. 

Walz, T. and Grigorieff, N. (1998) Electron crystallography of two- 
dimensional crystals of membrane proteins. Journal of Structural 
Biology, 121, 142-61. A fascinating review of electron diffraction 
as a means to study the structures of membrane-bound proteins. 

Weirich, T.E., Labar, J.L. and Zou, X. (eds) (2006) Electron Crys- 
tallography: Novel Approaches for Structure, Springer, Berlin. A 
good overview of applications of electron diffraction methods. 


Cryo-EM 


Topf, M., Lasker, K., Webb, B. et al. (2008) Protein structure fit- 
ting and refinement guided by cryo-EM density. Structure, 16, 
295-307. An article on combining high-resolution and cryo-EM 
data. 

Chiu, W., Baker, M.L., Jiang, W. et al. (2005) Electron cryomi- 
croscopy of biological machines at subnanometer resolution. 
Structure, 13, 363-72. A review of cryo-EM. 


Some useful web sites 


A movie of the GroEL ATPase cycle may be accessed at: http:// 
www.cryst.bbk.ac.uk/“ubeg 16z/cpn/elmovies.html. 

Sequence databases: http://www.ncbi.nlm.nih.gov. 

Protein database: http://www.rcsb.org. 

Cambridge Crystallographic Data Centre: http://www.ccdc.cam. 
ac.uk. 

List of synchrotron facilities worldwide: http://www.lightsources. 
org/cms/?pid= 1000098. 

Database of hydrogen and hydration in proteins (HHDB): http:// 
hhdb01.tokai-sc.jaea.go.jp/HHDB/. 


Chapter 7 
Hydrodynamic Methods 


Objectives 


of solutions and suspensions. 


e Analytical and preparative aspects of filtration. 
e Dialysis techniques and their uses. 
° The basis and applications of flow cytometry. 


After completing this chapter you should be familiar with: 


° The principal effects of biomacromolecules and particles such as viruses/cells on flow characteristics 


° The basis and main applications of centrifugation. 


Because of the importance of water as the universal bio- 
logical solvent, biochemical samples are usually handled 
and studied as aqueous solutions or suspensions. Some 
techniques described in this book involve interactions be- 
tween biological samples present in solution/suspension and 
some sort of immobile stationary phase. These experiments 
may be critically affected by the flow characteristics of the 
solution/suspension (e.g. column chromatography, Chapter 
2; capillary electrophoresis, Chapter 5). Moreover, common 
experimental manipulations in biochemistry such as cen- 
trifugation, dialysis and filtration are strongly influenced by 
flow interactions arising from hydrodynamic properties of 
the sample. These properties arise from physical interac- 
tions between molecules or particles and aqueous solvent. 
They are even more important in large-scale industrial sit- 
uations encountered in biotechnology such as downstream 
processing of protein products (Chapter 2; Section 2.5.2). In 
order to understand such experimental situations more fully, 
it is helpful to consider the physical basis of hydrodynamic 
properties. This chapter will summarize the principal hydro- 
dynamic properties of biomacromolecules and will describe 
their influence on flow-based experimental systems such as 
centrifugation, filtration, dialysis and flow cytometry. For 
simplicity, biomacromolecules (such as DNA and proteins), 
macromolecular complexes (such as ribosomes and viruses) 
and large biological entities (such as organelles and cells) 
are referred to as particles throughout this chapter. 


7.1 VISCOSITY 


Solutions/suspensions of particles are more viscous than 
the solvent alone. This alters the flow characteristics of the 


solution/suspension in a manner which depends on the con- 
centration, shape and mass of the particle. This can have 
important consequences for the handling of such solutions/ 
suspensions and may be used experimentally to learn about 
the shape and mass of the particle. 


7.1.1 Definition of Viscosity 


Movement of a particle across a stationary surface is inhib- 
ited by friction. The effect of this is to slow the movement 
of the particle. If the particle is present as a fluid (i.e. a 
liquid or gas), this friction gives rise to the phenomenon 
called viscosity. The molecular effects underlying viscos- 
ity depend partly on intermolecular interactions within the 
solution (e.g. van der Waals, hydrogen bonds, electrostatic 
interactions) and partly on the overall size and shape of sol- 
vent and solute molecules. Liquids are far more viscous than 
gases since intermolecular interactions are more common in 
the liquid than the gas phase. In the experimental situation 
described in Figure 7.1, flow of a liquid between two parallel 
plates separated by a distance, y, is shown. In 7.1(a), one of 
the plates is moving at a velocity, v, while the other plate is 
stationary and in 7.1(b) the liquid is moving with velocity, 
v, but both plates are stationary. Situations very similar to 
these could be expected to arise in many electrophoretic and 
chromatographic experimental formats using water as the 
main solvent. Since the velocity vectors of successive layers 
of water near to stationary plates would trace a parabolic 
shape, this phenomenon is called parabolic flow. 

Newton observed that the layer of liquid immediately 
adjacent to the moving plate in 7.1(a) would have a velocity 
very close to v while that adjacent to the stationary plate 
would have a velocity close to zero. Thus, a velocity gradient 
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Figure 7.1. Viscosity, (a) A linear shear gradient dv/dy is estab- 
lished between two plates one of which is moving at speed, v, while 
the other is immobile, (b) A parabolic shear gradient is formed 
when liquid flows between two immobile plates. 


(dv/dy) would exist across the solution between the two 
plates. He demonstrated that the shear force, f, between 
layers of liquid is proportional to both the area of the layers 
and the velocity gradient between them; 


faA.— (1.1) 


The constant of proportionality in Equation (7.1) is the 
viscosity, n, of the solution; 


dv 


f ane (7.2) 


In 7.1(a), friction is encountered at the stationary plate 
giving rise to a linear velocity gradient while, in 7.1(b), 
it is encountered at both plates giving rise to a parabolic 
velocity gradient. The term (f/A) is called the shear stress 
while dv/dy is the shear gradient. When n is a constant 
the fluid is said to be Newtonian while, if n is a function 
of either the shear stress or shear gradient, it is said to be 
non-Newtonian. 


7.1.2 Measurement of Viscosity 


Viscosity measurements are performed by measuring the 
time taken for a solution (volume, v) of a particle (e.g. a virus, 
organelle or cell), dissolved in a solvent of viscosity no to 
flow through a narrow capillary tube of radius, r and length, 
L under hydrostatic pressure. The most popular instrument 
is the Ostwald viscometer (Figure 7.2). The viscosity of the 


™ 


Figure 7.2. The Ostwald viscometer. Solution is added at opening 
A until the meniscus reaches point B. Suction is applied at point 
C until the liquid level is above point D. The time required for the 
meniscus of the liquid to flow from point D to point E is measured. 


sample solution, 7, is given by; 


m-h-g-p-r*-t 
= —— n 73 
: 8LV ve) 


where h is the average liquid height, g the gravitational 
constant and p the density of the solution. Since viscos- 
ity measurements of the solvent alone (7,) and of various 
concentrations of the particle (7) are carried out under iden- 


tical experimental conditions, they may be conveniently ex- 
pressed as relative viscosity, n,, of the solution/suspension; 


(7.4) 


Thus, knowledge of the density of the different solutions 
under study is the only prior information required for cal- 
culation of relative viscosity using the Ostwald viscometer. 


[n] 


"ep 


Concentration 


Figure 7.3. Measurement of intrinsic viscosity. Specific viscosity 
is measured at a range of concentrations of macromolecule/particle. 
Intrinsic viscosity ([7]) is the specific viscosity (nsp) at a concen- 
tration of zero. 


It has been experimentally observed that this parameter is 
dependent on the shape, concentration and volume of the 
particle under investigation. Consequently, viscosity pro- 
vides a simple means of determining aspects of the overall 
shape and mass of particles encountered in biochemistry. 


7.1.3 Specific and Intrinsic Viscosity 


An alternative, related measure to relative viscosity is spe- 
cific viscosity, Nsp; 


=m —1 (7.5) 


Nsp = 


To relate specific viscosity to intrinsic molecular charac- 
teristics such as shape and volume, it is necessary to deter- 
mine specific viscosity at a concentration of zero which is 
referred to as the intrinsic viscosity, [n]. This is achieved 
by measuring specific viscosity at a range of concentrations 
and extrapolating to zero (Figure 7.3). 

The Ostwald viscometer has the disadvantage that the 
shear gradient, G, cannot be varied. This is a limitation for 
non-Newtonian fluids where it is necessary to measure n 
under conditions where G = 0 (since in such fluids 7 varies 
with G). More sophisticated experiments are possible with 
the Couette viscometer which consists of two concentric 
cylinders with a narrow space between them for liquid (Fig- 
ure 7.4). This has previously been described in our discus- 
sion of experimental formats for the measurement of linear 
dichroism (Chapter 3; Figure 3.40). The value of G for such 


HYDRODYNAMIC METHODS 265 


(a) 
Static inner 
cylinder 


Rotating outer 
cylinder 


Liquid 


(b) 


Liquid 


Rotating inner 
cylinder 


Static outer 
cylinder 


Figure 7.4. Couette-type viscometers. (a) Couette viscometer fea- 
tures a static inner cylinder inside a mobile outer cylinder. In this 
design the shear gradient is fixed while the shear stress may be 
experimentally varied, (b) The floating rotor viscometer in which 
the outer cylinder is static and the inner cylinder is free to rotate 
while floating in the liquid. In this design, the shear stress is fixed 
while the shear gradient may be experimentally varied. 


an instrument is; 
G = —— (7.6) 


where R is the average radius of the cylinders, v is the 
rotor speed in revolutions per minute and d is the distance 
between the cylinders. The shear gradient is fixed while the 
shear stress may be calculated from; 


T 


F = ——— 1.1 
2x R?h OD 


where h is the height of the cylinder and T is the torque 
necessary to maintain the rotor speed, v. 

A modification of this basic design is the floating rotor 
viscometer in which the outer cylinder is static and the inner 
cylinder is free to rotate while floating in the liquid. In this 
design, the shear stress is fixed and the shear gradient is 
varied. 
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7.1.4 Dependence of Viscosity on Characteristics 
of Solute 


The relationship between [7] and molecular mass and shape 
is complex. However, generally speaking, the larger the 
molecule and the more extended its shape, the greater the 
value of [7]. An important factor in overall shape is the axial 
ratio (i.e. the ratio if the longest to the shortest axis for the 
molecule). Fibrous molecules have larger axial ratios than 
globular molecules of the same mass and therefore give more 
viscous solutions (Example 7.1). For double-stranded DNA 
the relationship between mass and [n] has been empirically 
determined as; 


log({n] + 5) = 0.665 -log M—2.863 (1.8) 


while for random coil proteins of n residues dissolved in 6N 
guanidinium hydrochloride it is; 


[n] = 0.716 - n° (7.9) 


Changes in viscosity may be used to follow processes 
in which the overall shape or elasticity of the molecule is 
altered. Examples of this include intercalation of drugs into 
DNA and protein denaturation (Example 7.2). 


7.2 SEDIMENTATION 


When particles are forced through a solution, they experi- 
ence resistance to movement which depends on properties of 
the particle such as mass, shape and density and properties 
of the solvent such as its temperature, viscosity, density and 
composition. A commonly-used experimental format where 
this occurs is sedimentation in a centrifugal field which is 
also called centrifugation. This can be used as a preparative 
strategy to separate complex mixtures present in biological 
samples. Alternatively, it can also be used analytically to 
determine the mass, shape or density of particles. 


7.2.1 Physical Basis of Centrifugation 


Consider a particle of mass, mo, suspended in a solvent 
of density, o. This particle would experience an upthrust 
equivalent to the weight of displaced liquid in accordance 
with Archimedes’s principle. The buoyant mass, m, of the 
particle is therefore mo minus a correction factor for this 
upthrust; 

m=M,—My-V- Pp (7.10) 
where v is the partial specific volume of the particle. This is 
equal to the volume displaced by the particle and is approx- 


Figure 7.5. Centrifugal force. A particle of mass, m, experiences 
a centrifugal force, F, when centrifuged at an angular velocity, œ, 
around a point from which it is distant by a radius, r. 


imately equal to the reciprocal of the density of the solute 
although the degree of particle hydration can affect it. For ex- 
ample, charged particles (which attract solvent molecules, 
compacting them in its vicinity) give smaller values of v 
than might be anticipated from particle density alone. 
If this particle is exposed to a centrifugal field (Fig- 
ure 7.5), it experiences a centrifugal force, F; 
F=m-o-r (7.11) 
where w is the angular velocity of rotation (in units of ra- 
dians/sec. with one revolution = 27 radians) and r is the 
distance of the particle from the centre of rotation (cm). 
Combining Equations (7.10) and (7.11) gives us; 


F=m(1—v-p)-@-r (7.12) 


This equation means that, as the mass, angular velocity 
or distance from the centre of rotation increases, so does the 
centrifugal force experienced by a particle 

As the particle passes through the solvent it experiences 
resistance due to a frictional force, F*, operating in the 
opposite direction; 

Fe=fev (7.13) 
where f is a frictional coefficient dependent on the shape 
and mass of the particle and the viscosity of the solvent 
and v is the velocity of the particle. At the beginning of 
centrifugation, the particle accelerates through the solvent 
but eventually the frictional force decreases this accelera- 
tion such that a constant velocity called the sedimentation 
velocity is achieved. At this point; 


F=F* 


ie. MO eS Py (7.14) 
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Example 7.1 Use of viscometry in the study of actin polymerization 


Chemical modification of actin in the lens of the eye is considered a major source of cataract formation leading to 
blindness. A primary source of the modification known as carbamylation is exposure to high levels of urea in the blood 
(uremia) which can be caused by certain clinical conditions (e.g. kidney failure, diarrhaea, dehydration). A possible 


mechanism of actin carbamylation involves isomerization of urea into ammonium cyanate followed by carbamylation of 
actin lysines: 


UREA AMMONIUM 
CYANATE 
NH2 NH,* 


NH) Cc 
II 
O 
iL? 
N 
I 
i 
we Nia 
(ra Hs 
B i B i 
co co 


This hypothesis was tested by exposing actin to the powerful carbamylating agent methylisothiocyanate and studying 
the effect of this on actin polymerization by viscometry using the Ostwald viscometer. Since polymerization results in 
a much larger protein assembly, it is to be expected that a solution containing such an assembly would be more viscous 
than one containing unpolymerized actin. Samples of various mixtures of carbamylated and uncarbamylated actin taken 
at various time-points over 24 min were analysed in the Ostwald viscometer and data for specific viscosity are presented 
in the following graph. 


Specific viscosity of control and carbamylated actin as a function of time. 


= Control (0.250 mg/ml {5.89 uM) actin) 
— 100% carbamylated actin (0.250 mg/ml {5.89 uM) 
** S0% carbamy lated actin (50% normal actin) 

mm 25% carbamylated actin (75% normal actin) 


Specific viscosity (l/l — 1.0) 


0 2 4 6 8 10 12 14 16 18 20 22 24 


Time period (minutes) 


(Continues)_! 


268 PHYSICAL BIOCHEMISTRY 


M (Continued) 


cytoplasm. 


The sedimentation velocity is given by; 


= m,(1 —v 2 “Orr (7.15) 


Sedimentation velocity is dependent on the shape of the 
particle but, for a perfect sphere, this quantity is related to 
physical properties of the particle and solvent by; 


2 
a — Pm 
y= El Phl 0 


187 (7.16) 
where a is the particle diameter, pp and pm are the densities 
of the particle and solvent, respectively and 7 is the vis- 
cosity of the solvent. The inverse relationship with viscos- 
ity means that sedimentation velocity decreases markedly 
with increase in solvent viscosity and (since viscosity is de- 
pendent on temperature) with temperature. For this reason, 
sedimentation velocities are usually corrected for different 
solvent composition and temperature to standard conditions 
of pure water at 20°C. 

The relationship described by Equation (7.16) underlies 
all centrifugation experiments in biochemistry and is the rea- 
son why different particles (i.e. with differing partial specific 
volumes and mass) can achieve individual sedimentation 
velocities in a single solvent at a single angular velocity. It 
provides the basis for differential centrifugation in which 
particles may be conveniently separated from each other. 

It is difficult to determine v, pọ and f accurately so a 
useful measure of differential behaviour in a centrifugal 
field is provided by the sedimentation coefficient, s; 


v _ m(l—v-p) 


roe 7 (7.17) 


Since both m,(1 — v - p) and f are inherent properties of 
the particle and solvent, respectively, s is a constant for any 
given particle/solvent. The range of values observed in bio- 
chemistry varies widely from as low as 1 x 107! s for a 
protein to 10000 x 107! s for intracellular organelles. A 
more useful notation is provided by the Svedberg, S, with 


This confirms that carbamylation abolishes polymerization and thus demonstrates that lysines are essential for this 
process. Moreover, inclusion of carbamylated actin with unmodified protein inhibits polymerization which raises the 
possibility that small amounts of carbamylated protein can seriously alter rates and extent of actin polymerization in the 


Reprinted from Kuckel et al. (1993) Methylisocyanate and Actin Polymerisation: The In Vitro Effects of Actin 
Carbamylation, Biochimica et Biophysica Acta 116, 2143—2148, with permission from Elsevier Science. 


one Svedberg equivalent to 1 x 107! s. There is a loose 
relationship between sedimentation coefficient and molecu- 
lar mass in that a protein such as cytochrome C (1S) has 
a smaller value than bacterial ribosomes (70S), tobacco 
mosaic virus (400S) or lysosomes (10 000S). However, this 
relationship is not a simple or linear one since, for exam- 
ple, the 70S ribosome is composed of particles of 50S and 
30S, respectively. Sedimentation coefficients are nonethe- 
less useful indicators of relative size especially in massive 
particles such as viruses. 


7.2.2 The Svedberg Equation 


The frictional coefficient is related to a number of other 
physical parameters; 


f=—— (7.18) 


where R is the universal gas constant, N is Avogadro’s 
number (Appendix 1), T is the absolute temperature and D 
is the diffusion coefficient. Combination of Equations (7.17) 
and (7.18) gives us the relationship between s and these 
physical parameters; 


o 1— e 
mU =p) (7.19) 
R-T/N-D 
This gives us; 
R-T-s=N-D-m,(1—v- p) (7.20) 


But molecular mass, M,, is the product of m, and N so; 


R-T-s 


DEAT (7.21) 


r 


This is the Svedberg equation which relates molecular 
mass to intrinsic physical properties of the particle (s and 
v), experimental conditions (D, T and p) and a physical 
constant (R). Since many of these terms are experimen- 
tally measurable, this equation provides us with an a priori 
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Example 7.2 Use of viscometry in the study of ligand binding to DNA 


Noncovalent binding of ligands to DNA may occur by a number of distinct mechanisms. One of the most important 
of these is intercalation into the double-helix as exemplified by some planar compounds such as the fluor ethidium 
bromide (Chapter 3) and the dye aniline. Intercalation can have major biological consequences such as the introduction 
of mutations which underlies the antibacterial activity of molecules like aniline. 

The monovalent cationic complex bis(1,10-phenanthroline) copper (I) [abbreviated to; (Phen) Cu!] has the unusual 
property that it binds noncovalently to DNA and this process results in cleavage which shows some degree of sequence 
specificity. This makes it interesting as a probe for local DNA conformation. 

Intercalation into DNA makes the molecule far more rigid and thus significantly increases the viscosity of DNA 
solutions. The reason for this is that intercalation results in greater distance between base-pairs leading to an apparent 
increase in length of the molecule. Viscometry is therefore a useful measure of intercalation. 

Measurements of specific viscosity were performed in a capillary viscometer on 800 bp lengths of DNA in the 
presence of the following reagents; (1) Ethidium bromide; (2—4) phenanthrolene mixed with CuSO, in ratios 1:1, 2:1 
and 4:1, respectively; (5) 2:1 2,9 dimethyl-1,10-phenanthrolene:CuSO,; (6) phenanthrolene alone. The mixture of 
phenanthrolene (in the ratio 2:1) and 2, 9 dimethyl-1,10-phenanthrolene with CuSO, generates the (Phen);Cu! and 
bis(2,9 dimethy]-1,10-phenanthrolene) copper(I) complexes, respectively. 

The results are expressed as ratios of DNA specific viscosity (hsp) in the presence (Cu) and absence (Cu = 0) of 
ligand. 

These data demonstrate that ethidium bromide and phenanthrolene-copper complexes increase DNA specific vis- 
cosity as a result of intercalation. Thel:1 ratio of phenanthrolene: copper generates mainly (Phen)Cu! while the 2: 1 
ratio generates (Phen),Cu!. The 4: 1 ratio contains considerable free phenanthrolene which inhibits (Phen),Cu! binding 
to DNA. Both phenanthrolene alone and bis(2,9 dimethyl-1,10-phenanthrolene) copper(I) complex resulted in neg- 
ligible intercalation. Spectroscopic measurements confirmed that neither of these reagents binds to DNA. The pres- 
ence of two methyl groups on each phenanthrolene substituent of the (Phen),Cu' complex appears to be sufficient to 
abolish intercalation which is thought to be due to structural rigidity in the molecule and the difficulty of oxidizing 
copper(I). 

Reprinted with permission from Veal and Rill, Noncovalent Binding of Bis (1, 10-phenanthrolene) Copper(i) and 
Related Compounds, Biochemistry 30, 1132-1140, © (1991) American Chemical Society. 
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means of directly determining molecular mass. This con- 7.2.3 Equipment Used in Centrifugation 


trasts with empirical methods of mass determination using 


approaches such as electrophoresis (Chapter 5), chromatog- 
raphy (Chapter 2) or mass/charge ratio (Chapter 4) which 
rely on comparison with molecules of known molecular 
mass. 


Centrifugation is performed in a centrifuge which provides 
a system for rotating sample at high speed around a central 
point so as to develop significant centrifugal force (Fig- 
ure 7.6). The sample is placed in a reinforced plastic tube 
which is then held in a rotor which rotates around a spindle. 
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Rotor Rotor 


chamber 


Sample 
tube 


Spindle: 
Figure 7.6. Outline design of a centrifuge. Samples contained in 
sample tubes (only four are shown for clarity) are placed in a rotor 
which is fixed on a spindle. Centrifugation takes place in the rotor 
chamber (dashed lines) which is often armoured, refrigerated and 
evacuated. 


Additional possible features include refrigeration (which 
usually also involves evacuation of the rotor chamber), han- 
dling different volumes of samples, ability to achieve ex- 
tremely high speeds and the possibility of analysis of the 
sample during the experiment. High-speed centrifuges are 
usually heavily armoured around the rotor chamber to pro- 
tect against accidents which might involve the spindle break- 


(a) 


(b) 


ing and release of the rotor (for this reason it is essential to 
ensure that sample tubes placed on opposite sides of the 
spindle are evenly-balanced by weighing them before the 
experiment). 

The rotor may be of two general designs (Figure 7.7). In 
fixed angle rotors the tube is placed in a machined hole in 
the metal rotor which is at a fixed angle relative to the ver- 
tical axis of the spindle. This angle does not change during 
the centrifugation experiment and the sample is centrifuged 
against the side-wall of the tube. In swinging bucket rotors, 
the tube is placed in a holder which is suspended from the 
rotor. During the experiment, the holder swings out to be- 
come horizontal with the horizontal axis of the rotor and 
sample is centrifuged towards the bottom of the tube. Be- 
cause rotor design varies so much between manufacturers 
and the distance between the spindle and the sample (r in 
Equation (7.15) above) can differ widely, it is often conve- 
nient to express centrifugal force as a value relative to that of 
the Earth’s gravitational field called the relative centrifugal 


force, g; 


g =1.119-107>- (rpm)? -r (1.22) 
oe 
F 
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Figure 7.7. Rotors used in centrifugation. (a) Fixed-angle rotors. The rotor is solid steel with openings machined in it for sample tubes. The 
angle of the tubes does not change during the experiment and particles are sedimented against the back of the tube, (b) Swinging bucket 
rotor. The sample tubes are suspended on the rotor and swing out during the experiment. Particles sediment towards the bottom of the tube. 


where rpm is the number of revolutions per minute of the 
rotor. Manufacturers usually provide a scale for each rotor 
by which rpm values on the instrument can be converted 
into g values in which centrifugation conditions should be 
reported. Different rotors accept sample tubes of different 
size allowing a range of volumes of sample to be centrifuged. 

Three levels of centrifuge can be described in order of 
increasing sophistication (and price!). The first two would 
be encountered by most practicing biochemists and molec- 
ular biologists while the third is normally only used by 
specialists. 

The most widespread type of centrifuge is the bench- 
top centrifuge. This may lack a refrigerated rotor chamber 
and can accept small volumes (e.g. 1-2 ml) of sample for 
centrifugation at relatively low speed (e.g. 6000 g). This is 
useful for collecting precipitated protein or DNA as a com- 
pact pellet from the bulk sample (e.g. from ammonium sul- 
fate or ethanol, respectively). Higher speed versions need 
to be refrigerated because high speed heats up the rotor 
due to air friction. Small volume benchtop centrifuges are 
sometimes called minifuges and these can achieve relative 
centrifugal forces of some 12 000 g with 0.5-1.5 ml sample 
volumes. 

High-speed large-volume ultracentrifuges can reach rel- 
ative centrifugal force values of up to 150 000 g. These may 
accept swinging bucket or fixed angle rotors, have refrig- 
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erated (and evacuated) temperature-controlled rotor cham- 
bers and can accept sample tubes with volumes from 10 to 
250ml. They would be routinely used in centrifugation of 
tissue extracts and in recovery of precipitated protein from 
ammonium sulfate precipitation. 

Analytical ultracentrifuges achieve relative centrifugal 
forces of up to 600000 g and facilitate analysis of the be- 
haviour of sample particles during the centrifugation ex- 
periment. The development of computer-controlled XL-A 
and XL-I centrifuges by Beckman Instruments combined 
with developments in data analysis has led to a resurgence 
of interest in analytical centrifugation. Such measurements 
allow the determination of M, value with the Svedberg equa- 
tion (Equation (7.21)) provided the other parameters in that 
equation are known. A feature of analytical centrifuges is 
the presence of an optical system for detection of particle 
movement during the experiment. These depend on either 
direct absorbance measurements (for the XL-A in the range 
190-650 nm; Chapter 3) or on comparison of refractive in- 
dex of solvent alone with that of sample which results in in- 
terference (the XL-I has both absorption and interference op- 
tics). These measurements follow the absorbance/refractive 
index of particles as a function of position during centrifu- 
gation by means of a window in the cell containing sam- 
ple which passes through the detection system once during 
each revolution (Figure 7.8). Fluorescence detection brings 


Data 


analysis 


Sample 
cell 
Single-sector 


Light source 


Double-sector 


Figure 7.8. Outline design of an analytical ultracentrifuge. Sample is placed in a sample cell in a rotor contained in an evacuated and 
refrigerated chamber. Centrifugation is carried out at very high speed during which the behaviour of the sample is analysed. The cell is 
scanned with parallel light as it passes through the detection system on each revolution. Transparent windows at the top and bottom of 
sample cells allow light to pass through the sample. Optical systems use absorbancc, fluorescence or refractive index to detect sample 
concentration as a function of position in the cell. Refractive index measurements of sample is continuously compared to that of solvent 
alone in a double-sector sample cell shown on the right. This data is continuously collected during the experiment, stored in a microcomputer 


and subsequently analysed. 
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Figure 7.9. Subcellular fractionation. A tissue homogenate may be fractionated into nuclear, mitochondrial, microsomal and soluble 
fractions. These are enriched for biomolecules normally associated with the organelles/complexes shown. Centrifugation conditions between 
those described are also possible to fractionate the homogenate further. Cross-contamination between fractions can be assessed with suitable 


biochemical markers (Table 7.1). 


the detection limit as low as 107!° M for proteins and DNA 
labeled with extrinsic fluors. 

The following (Sections 7.2.4 to 7.2.6) describe some 
applications of centrifugation in biochemistry. 


7.2.4 Subcellular Fractionation 


Individual subcellular organelles of eukaryotic cells may be 
regarded as individual types of particles with characteris- 
tic ranges of mass and density. Centrifugation of cell ho- 
mogenates therefore allows preparation of highly-purified 
fractions enriched in one or other cell compartment such 
as the nucleus, endoplasmic reticulum or mitochondria in a 
process called subcellular fractionation (Figure 7.9). This 
is a useful experiment to identify the most likely subcellular 


location of a specific biomolecule or process. In carrying out 
such experiments it is important to assess the quality of sep- 
aration of individual fractions and this is achieved by using 
marker molecules (usually enzymes) known to be expressed 
in a compartment (Table 7.1). 


Table 7.1. Marker molecules useful in subcellular fractionation 
studies 


Compartment Marker 

Nucleus DNA 

Cytosol Lactate dehydrogenase 
Mitochondrion Succinate dehydrogenase 
Lysosomes Alkaline phosphatase 
Microsomes Cytochrome P-450 


Ideally, 100% of the marker present in the extract should 
be detected in the subcellular fraction corresponding to its 
known subcellular location with none being detectable in 
other fractions. Greater and lesser values than 100% can be 
found as a result of the removal of some inhibitory molecule 
to another fraction (greater) or loss of activity due to denat- 
uration or proteolysis (lesser). Moreover, in practice some 
contamination between fractions usually occurs due to han- 
dling errors but measurement of the markers allows this to 
be quantified. If a protein under investigation is found, for 
example, to mirror the distribution of succinate dehydroge- 
nase across the subcellular fractions, then it is reasonable to 
conclude that its subcellular location is the mitochondrion. 
Isolation of high-purity subcellular organelles is a key step 
in organelle proteomics (Chapter 10). 


7.2.5 Density Gradient Centrifugation 


We can make use of the dependence of sedimentation ve- 
locity on solvent and particle density (Equation (7.16)) by 
centrifuging particles through a medium of gradually in- 
creasing density. This is achieved by establishing a density 
gradient between a region of high density at the bottom of 
the centrifugation tube and a region of low density at the 
top (Figure 7.10). The density gradient can be established 
either before particle centrifugation (zonal centrifugation) 
or during centrifugation (isopycnic centrifugation). 

The zonal centrifugation experiment involves creation of 
a density gradient before centrifugation by mixing high and 
low density solutions of materials such as sucrose and glyc- 
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Figure 7.10. Density gradient centrifugation. A density gradient 
(dashed line) can be formed either during or prior to centrifugation 
due to a concentration of high-density material at the bottom of the 
centrifuge tube and correspondingly lower density near the top. An 
ideal, linear density gradient is shown although other gradients with 
other shapes are also possible. 
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erol (Figure 7.11). This gradient is created in a centrifuge 
tube prior to centrifugation and sample is then layered on 
top. Particles sediment through the gradient with gradually 
decreasing velocity. This is because the term (pp — Pm) 1s 
large at low density giving a relatively high velocity (Equa- 
tion (7.16)). As the density of the medium increases (pm) on 
passage through the gradient, this term gradually decreases. 
Eventually a point in the gradient is reached where pp = Pm, 
resulting in sedimentation velocity values of zero. 

This means that, because different particle populations 
have different densities, they sediment at different rates and 
separate out from the sample mixture to different densities. 
If the gradient is collected in individual fractions, highly- 
purified particles may be obtained in this way. 

In isopycnic centrifugation, the metal salt CsCl, is used 
because of the high density possible with this material (up 
to 1.8 g/ml). A solution of CsCl, is mixed with the sample 
and then centrifuged which forms a CsCl, density gradi- 
ent from low density at the top of the centrifuge tube to 
high density at the bottom (Figure 7.12). Sample particles 
in the region of highest density at the bottom of the tube 
are less dense than the surrounding medium. They there- 
fore tend to float towards the top of the tube to a region 
of lower density. The driving force for this is that centrifu- 
gal force drives salt molecules towards the bottom of the 
tube thus exerting upward pressure on particles of lower 
density. Conversely, some of the same type of particle in 
the region of lowest density at the top of the tube will sed- 
iment towards the bottom of the tube in accordance with 
Equation (7.16). Eventually each particle collects and con- 
centrates in a narrow band at its isopycnic density. A classic 
application of this is the Meselson-Stahl experiment which 
demonstrated the semiconservative nature of DNA repli- 
cation by separating ‘heavy’ ('°N-containing) DNA from 
‘light’ ('4N-containing) DNA in a density gradient. 

With the exception of lipoproteins (which have quite low 
density), the densities of protein molecules are generally 
similar and this technique is not very useful for separat- 
ing proteins from each other (the solutions that would be 
required would also not be compatible with native protein 
structure). It has been widely-used, however, for separating 
different forms of DNA (i.e. plasmid from chromosomal; 
supercoiled from uncoiled) and for preparation of macro- 
molecular assemblies (e.g. viruses) and individual cell types. 
For DNA experiments, it is usual to add the extrinsic fluor 
ethidium bromide to the solution prior to centrifugation. 
This intercalates into the double-stranded DNA allowing 
its convenient visualization under a UV lamp (Section 3.4; 
Table 3.4). Materials such as ficoll and percol are particu- 
larly useful for separation of individual organelles or cell 
types by density gradient centrifugation since they do not 
subject particles to high osmotic pressure or shear forces. 
Density gradients (whether prepared by centrifugation or 
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Figure 7.11. Formation of a density gradient prior to centrifugation. Two solutions of sucrose or glycerol corresponding to the lowest and 
highest density required are placed in separate reservoirs as shown. The low density solution is connected to the high density solution by 
tubing filled with low density solution. The high density solution is similarly connected to the bottom of the centrifuge tube by tubing filled 
with high density solution. A peristaltic pump is used slowly to fill the centrifuge tube (direction of flow shown by arrows). As the level of 
the high density reservoir drops, that of the low density reservoir also drops in response to hydrostatic pressure. Constant stirring of the high 
density solution ensures gradual lowering of solution density and formation of a smooth gradient. 


otherwise) provide an extremely useful general method to 
determine the density of a wide range of particles such 
as viruses, organelles, cells and crystals of protein/DNA 
(Chapter 6). 


7.2.6 Analytical Ultracentrifugation 


So far we have described two general types of centrifugation 
experiment. Differential centrifugation allows separation of 
organelles and other biochemical compartments from each 
other on the basis of differential sedimentation in a centrifu- 
gal field. Density gradient centrifugation allows separation 
of particles in density gradients based on their differing den- 
sity. Despite their obvious differences in methodology, both 
of these experiments involve analysis after centrifugation 
has been carried out. 

It is also possible to combine analysis with the pro- 
cess of sedimentation and to learn about certain physical 
properties of particles from analysis of their dynamic be- 
haviour in the centrifugal field. This is called analytical ul- 
tracentrifugation and, due mainly to improvements in equip- 
ment and data analysis, this field is currently undergoing a 
renaissance. 

Analytical ultracentrifugation offers a number of major 
advantages in that it is possible to study the behaviour of 
biomacromolecules under a wide range of solvent conditions 
where they will not interact with support materials (which 
is acommon problem with electrophoresis and chromatog- 
raphy systems). Moreover, sedimentation analysis allows 
behaviour of macromolecules across a wide concentration 


range to be studied in a single experiment whereas most 
other methods described in this volume require individual 
measurements to be performed at each sample concentra- 
tion. Thirdly, samples in a size-range up to tens of millions 
of daltons which are difficult to study by other methods (e.g. 
viruses) are amenable to sedimentation analysis. In current 
practice, two distinct types of analytical centrifugation ex- 
periment are commonly performed and these are described 
in the following two sections. 


7.2.7 Sedimentation Velocity Analysis 


Sedimentation velocity analysis involves following the be- 
haviour of sample particles as they move through the solvent 
during sedimentation at relatively high centrifugal speeds. 
Particles become distributed in a concentration gradient of 
characteristic shape which changes during the experiment 
(Figure 7.13). At the beginning of the experiment, sam- 
ple is distributed uniformly through the sample cell. The 
lamella of molecules at the interface between this solution 
and air is referred to as the boundary. During centrifuga- 
tion, these molecules move towards the bottom of the sam- 
ple cell which causes a change in the boundary position and 
shape which may be followed by absorbance, fluorescence 
or refractive index. This is because sedimentation forces the 
sample particles to move through the solvent away from 
the centre of rotation while diffusion causes spreading of the 
boundary resulting in a change in boundary shape during the 
experiment. Computer methods for analysis of these bound- 
aries have been greatly improved in recent years making a 
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Figure 7.12. Formation of a density gradient during centrifuga- 
tion. (a) A solution of uniform density of a material such as 
CsCl) is mixed with sample and placed in a centrifuge tube, 
(b) During centrifugation centrifugal force (arrow) concentrates 
CsCl? towards the bottom of the tube leading to greater density 
at the bottom and lower density at the top. Sample particles frac- 
tionate as they concentrate at their individual isopycnic densities. 
The shape of the density gradient obtained is shown in the graph 
(right), together with isopycnic densities of two particles from the 
sample. 


number of physical and chemical parameters accessible to 
determination by this method. These include the buoyant 
mass [m,(1 — v - p)], sedimentation and diffusion coeffi- 
cients of individual molecular species. 

The sedimentation coefficient, s, is the ratio of velocity 
to acceleration of the centrifuged particle which is related 
to physical and hydrodynamic properties of the sample and 
solvent (Equation (7.17)). If we could measure the posi- 
tion, r, of a boundary formed by a dilute solution of the 
particle at any time, t, we could experimentally determine 
the apparent sedimentation coefficient, s*, by sedimentation 
velocity. This is determined by measuring the rate of move- 
ment of the boundary through the column of solvent in the 
centrifugation cell; 


* 


_dinr 
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Figure 7.13. Sedimentation velocity. Sample is dissolved in sol- 
vent in sample cell. It is then centrifuged at high speed as shown 
and absorbance (or fluorescence or refractive index) are measured 
along the cell at various times during the experiment. Data for tı to 
15 are shown. The sample meniscus (arrow) is the interface between 
sample and air. This is referred to as the boundary. The shape and 
position of the boundary changes with time and this data is analysed 
to determines s, D and M,. 


Which is related to the position of the meniscus at the top 
of the solution, rn, by; 


_ Inr/rm 
~ fadt 


ak 


(7.24) 


This relationship may be used to convert boundary posi- 
tions relative to the meniscus to values of s*. 

Diffusion is responsible for boundary spreading during 
the experiment and may be described by Fick’s first law 
which relates the diffusive flux, Jp to the concentration of 
material, c, at any point, r, by the diffusion coefficient, 
D. This may be experimentally determined as the rate of 
boundary spreading dc/dr; 


Jp=—D.- (7.25) 


or 
D can therefore be determined simultaneously with s* 
by sedimentation velocity in relatively long centrifugation 
cells. 

For more concentrated particle solutions, a number of 
corrections need to made to s* and D to allow for fac- 
tors such as hydrodynamic and thermodynamic nonideality. 
For example, solvent displaced by one particle can have 
an effect on sedimentation of another particle in concen- 
trated solutions. These corrections result in a value for the 
parameters which would apply at standard conditions of 
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near-zero concentration, at 20°C, in pure water; 55), and 
Dow: 

These parameters have a number of practical uses. The 
buoyant molecular mass can be estimated from the values 
of S%,w and D5, using the Svedberg equation (Equation 
(7.21)). If the molecular weight of the particle is already 
known, it is possible to estimate its degree of solvation (e.g. 
protein hydration) and overall shape during sedimentation 
because s3 ,, depends strongly on the frictional coefficient, 
f, which in turn depends on the Stoke’s radius, R,; 


f = 6 R; (7.26) 


Both degree of solvation and deviation from a compact 
spherical shape increase R, and hence f so that unexpected 
values for s5o ,, can be used to detect and quantify solvation 
and shape effects. 

The analysis of more complicated experimental systems 
such as interacting systems (e.g. macromolecule-ligand) and 
multi-component systems (in which several boundaries are 
formed during sedimentation) is more complex but is fa- 
cilitated by the availability of computer programs for data 
analysis which incorporate mathematical models for such 
systems. Fitting of experimental data to these models al- 
lows rapid estimation of quantities such as stoichiometry 
and binding constants for ligands (Example 7.3). More in- 
formation on the analytical approaches applicable to such 
complex systems may be obtained from the excellent re- 
views cited in the bibliography at the end of this chapter. 


7.2.8 Sedimentation Equilibrium Analysis 


The sedimentation conditions for sedimentation equilibrium 
analysis differ from those used in sedimentation velocity in 
that much lower centrifugal speeds are used in much shorter 
columns of solvent. An equilibrium is formed between sed- 
imentation and diffusion which results in formation of a 
smooth concentration gradient rather than fixed boundaries 
(Figure 7.14). Because formation of a stable equilibrium is 
quite slow, sedimentation equilibrium experiments may re- 
quire days of centrifugation making them also much slower 
than sedimentation velocity. 

If we consider a plane of area, A, the diffusive flux, Jp 
(Equation (7.25)), is equal to the sedimentation flux, Js 
under these conditions; 


(7.27) 


where c is the concentration of the particle. Combining 
Equations (7.25) and (7.27) gives us; 
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Figure 7.14. Sedimentation equilibrium. Centrifugation achieves 
an equilibrium between the rate of diffusion and of sedimentation. 
The time required for this is usually 70-100h depending on the 
simplicity of the system, particle characteristics, solvent viscosity 
and other factors. Achievement of equilibrium can be confirmed 
by superimposing scans collected several hours apart. The position 
of the equilibrium (i.e. the shape of the concentration gradient) 
depends on thermodynamic properties of system components. The 
gradient shown is idealised for simplicity as real concentration 
gradients have characteristic and not neccessarily linear shapes. 


or; 


M -o 
o= (7.29) 
RT 


The concentration at any point of position r is related to 
the concentration, co, at a reference radius, ro by; 


clr) = co e7? (7.30) 


where £ is (r? — r2)/2. 

Rotor speeds (i.e. œ) are selected such that the value for 
o lies in the range 2 to 15 cm~?. The ratio of concentration 
at the base of the cell is 2-10 times that at the meniscus in a 
typical 3 mm centrifugation cell. A range of concentrations 
are used in different cells so that, in a typical experiment, a 
100-fold range of concentration is used. 

Provided that appropriate solvent conditions are chosen 
so that it does not contain very high concentrations of 
individual molecules (e.g. 8M urea) or molecules that 
bind to proteins in significant quantities (e.g. detergents), 
the concentration gradient developed in a sedimentation 
equilibrium experiment is directly related to the chemical 
potential of the particle which, in turn, is related to its Gibb’s 
free energy, G. In other words, while hydrodynamics may 
affect the time required to achieve equilibrium, the actual 
concentration gradient achieved is determined by thermo- 
dynamics. If the concentrations at two points in the gradient, 
rı and r2, were expressed as cı and c2, the potential energy 
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Example 7.3 Effects of Vinca alkaloids on tubumin polymerization studied by sedimentation velocity 


Certain substances originally derived from the Vinca plant called the Vinca alkaloids are used extensively in cancer 
chemotherapy. Vincristine and vinblastine inhibit assembly and dynamics of microtubules and in this way act selectively 
against rapidly-dividing cells such as cancer cells. Jn vitro studies demonstrate that these molecules result in extensive 
microtubule depolymerization and promote formation of coiled spiral aggregates of tubulin heterodimers instead of 
microtubules. There is some interest therefore in knowing more about the interaction of Vinca alkaloids with tubulin 
and, in particular, on the relative effects of GTP and GDP on this interaction since the presence of GDP enhances 
vinblastine-induced spiral formation. 

Because tubulin self-association leading to spiral formation results in a larger protein assembly, one way of following 
the process is by sedimentation velocity. This technique also allows determination of equilibrium constants governing 
binding of alkaloid to tubulin heterodimers (K,), of alkaloid-tubulin heterodimers to spiral aggregates (K2), of free 
alkaloid to tubulin polymers (K3) and of unliganded tubulin heteropolymers to spiral aggregates (K4) under a variety of 
experimental conditions (see model). 


Tubulin + K, 
heterodimer L S 


Vinca 
alkaloid 


Spiral 
aggregate 


Sedimentation velocity experiments were performed with self-associating tubulin in the presence of vincristine, 
vinblastine and a synthetic analogue, vinorelbine. The effects of adding either GDP or GTP was assessed for these three 
Vinca alkaloids at a range of temperatures (5, 25 and 30°C) and at either 30 000 or 40 000 RPM. Protein was quantified 
in the boundaries by measuring A»go. Fitting of sedimentation data allowed estimation of S29, w and equilibrium binding 
constants. 

Sedimentation data for tubulin in the presence of GTP for A) vincristine, B) vinblastine and C) vinorelbine are shown 
below. These data were obtained at 5 °C (solid line), 25 °C (dashed line) and 35 °C (dotted line). The increase in S20, w 
with temperature suggests a largely entropically driven process. 
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Variation in S39 w as a function of alkaloid concentration in the presence of GTP (open boxes) or GDP (filled boxes) are 
shown below. These confirm that the maximum S2ọ,w in the presence of GDP is higher for all three alkaloids compared to 
GTP. Moreover, vincristine promotes formation of a larger species than vinblastine with vinorelbine producing a species 
only half as large under the same experimental conditions. These data confirm that GDP enhances tubulin assembly 
relative to GTP for all three Vinca alkaloids. 
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M (Continued) 


Fitting of sedimentation data allowed estimation of equilibrium constants for the model shown above. Selected data 
obtained for vincristine and vinorelbine at 5 °C in the presence of GTP and GTP are as follows; 


distribution in the gradient could be expressed as a Boltz- 
mann distribution; 
CL gE Eg) eT 


(7.31) 


C2 


where k is the Boltzmann constant (Appendix 1), T the 
absolute temperature and EF; and Ez the potential energy 
of the solute at points rı and r2, respectively. The potential 
energy difference between molecules at r; and rz is given 
by; 


(E; = E2) = 1/2- im fi — v- p]: o (r = i) (1.32) 


Combining Equations (7.31) and (7.32) and replacing k 
by the Universal Gas Constant R (Appendix 1) gives us an 
expression for molecular mass, M,, in units of daltons which 
may be determined from equilibrium centrifugation data; 


2R-T In c, 1 


Sea ae (7.33) 


E 
r2 — a2 


where c, is the concentration of solute at a distance r from the 
axis of rotation c, is the concentration at the meniscus and 
a is the distance of the meniscus from the axis of rotation. 
This relationship allows calculation of M, from the equi- 
librium distribution of molecules through the concentration 
gradient formed in the experiment because a plot of In c, 
versus r° should yield a slope of M,(1 — v - e)/2RT. Note 
that, unlike M, determination by sedimentation velocity (see 


Drug GXP Kı K: K3 
M™! x 10° M™! x 10° M™! x 10° 

vincristine GTP 2a) 2.9 71 

GDP 22 6.4 140 

vinorelbine GTP 0.78 0.95 73 

GDP 0.86 1.9 16 


This demonstrates how sedimentation velocity facilitates study of a wide concentration range involving simultaneous 
analysis of a large number of individual samples. This allows determination of detailed affinity data for complex 
biochemical processes involving proteins such as, in this case, assembly into large complexes. 

Reprinted with permission from Lobert, Vulevic and Correla, Interaction of Vinca Alkaloids with Tubulin: A compar- 
ison of Vinblastin, Vincristine and Vinorelbine, Biochemistry 35, 6806-6814, © (1996) American Chemical Society. 


previous section), knowledge of the diffusion coefficient is 
not required. 

A number of other useful measurements can be deter- 
mined from sedimentation equilibrium data. These include 
accurate estimates of particle density and thermodynamic 
information about association constants, binding stoichiom- 
etry and solution nonideality (Example 7.4). Deviation from 
the ideal behaviour described in Equation (7.33) can be anal- 
ysed using models which simulate the presence of multiple 
components, hetero-/self-assembly and other forms of non- 
ideal behaviour. More detailed information on further uses 
of sedimentation equilibrium can be obtained from literature 
cited in the bibliography at the end of this chapter. 


7.3 METHODS FOR VARYING 
BUFFER CONDITIONS 


Biological samples are highly-sensitive to their immediate 
chemical environment (Chapter 1) and are best handled in an 
aqueous buffer at an appropriate ion strength. This helps to 
maintain the sample in a structure which retains maximum 
biological activity. Frequently, this activity may also be de- 
pendent on the presence of smaller species such as metals, 
coenzymes, allosteric modulators or stabilizing agents (e.g. 
reducing agents) and we may need to be able to change the 
buffer composition in a controlled way to vary the popula- 
tion of small molecules in contact with the sample. Alter- 
natively, we may want to perform a series of procedures in 
which each step may require distinct conditions (e.g. column 
chromatography steps in protein purification; Chapter 2). 
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Example 7.4 Sedimentation equilibrium studies of a complement serine protease 


The first component of the complement pathway, C1, consists of three interacting subcomponents, Clq, Clr and Cls. The 
two latter components are homologous serine protease zymogens which form a Ca?*+-dependent tetramer, C1s-C1r2-C1s 
which associates with Clq to form Cl. The zymogens are activated to functional serine proteases Cls* and Clr* in a 
process mediated by Ca?* which triggers the first step in complement activation. Cls forms a dimer when Ca?* binds 
at an N-terminal domain and the stabilization of the C1s-C1r2-C1s tetramer is also facilitated by Ca?* coordination 
to homologous N-terminal domains. Interaction with Ca?* is therefore a major feature of both complement structure 
and activation. Several binding sites of varying affinity have been identified by methods such as equilibrium dialysis 
(Section 7.3.2). 

Detailed analysis of Ca?*-binding requires a technique suitable for studying a wide range of Ca? and protein 
concentrations at equilibrium. Sedimentation equilibrium lends itself in particular to this type of analysis. Protein was 
equilibrated by dialysis with either 5mM or 1 nM Ca’*. It was then centrifuged at 6500-7000 rpm in swinging bucket 
rotors (90 ul volume) in a preparative centrifuge for 24hours. Note that, in contrast to the previous example, this 
centrifugation experiment was not carried out in an analytical centrifuge. Sample was collected as individual fractions 
and the equilibrium distribution of protein under these conditions was stabilized by inclusion of dextran-70 with the 
protein sample. 

Aogo was plotted as a function of radial distance for the two samples as shown below. The molecular mass, Mr, was 
estimated using essentially the equations described in Section 7.2.8 and was found to be 79 (1nM Ca?) and 143 kDa 
(5mM Ca?*), respectively. This suggests that Ca?+ promotes dimer formation between C1s* monomers. 


[C**]=1nM 


-[C**] = 5 mM 


8.25 8.30 8.35 840 845 


r(cm) 


Effect of Ca? on equilibrium centrifugation of C1s*. 

These data for equilibrium distribution were analysed using several models and the simplest model fitting the data was 
one in which the C1s* monomer contains a single Ca**-binding site (K, = 3 x 10° M7!) and the C1s* dimer contains 
three independent Ca**+-binding sites two of which have low affinity (K, = 3 x 104M7!) while the third (K, = 1 x 
108 M7!) is formed by the two N-terminal domains and provides the driving force for dimer formation (see diagram 
below). 


K,=3x 10° M7! 


<= 


K,=3x 10° M7! 


Reprinted with permission from Rivas, Ingham and Minto, Calcium linked Self Association of Human Complemetn 
C1*s, Biochemistry 31, 11707-11712, © (1992) American Chemical Society. 


This section will describe the principal methods of intro- 
ducing small molecules to solutions containing biomacro- 
molecules and, conversely, of removing small molecules 
from such solutions while retaining biomacromolecule 
quantity and activity. We will also describe techniques for 
recovery of biomacromolecules from solution. 


7.3.1 Ultrafiltration 


A variety of porous membrane materials are available for fil- 
tration of solutions which selectively retain large biomacro- 
molecules. The solution can be forced through these mem- 
branes under pressure which results in concentration of the 
protein/DNA molecules since solvent passes freely through 
the membrane while protein/DNA do not. This is called 
ultrafiltration. Commercially-available membranes contain 
a range of specified pore-sizes which give the membrane 
an average M, cutoff (e.g. 10 kDa). Most molecules smaller 
than this pass freely through the membrane (usually referred 
to as the permeate) while larger molecules are retained (the 
retentate). It is important to understand that these average 
cutoffs are usually specified for globular proteins and may 
differ from the actual M, values of fibrous proteins or DNA. 

An outline design of the apparatus in which the mem- 
brane is mounted is described in Figure 7.15. A stirring bar 
allows constant, gentle mixing during concentration while 
an inert gas such as nitrogen can be used to create a pressur- 
ized atmosphere over the solution. A problem sometimes 
encountered with ultrafiltration is that the very fine pores 
through the membrane can become clogged by denatured 
protein or other sample debris over time. It is therefore wise 
to clarify the sample by centrifugation before ultrafiltration. 
The membrane can also be cleaned by treatment with trypsin 
which proteolytically degrades any protein accumulated in 
the pores. Care should be taken that the ultrafiltration appa- 
ratus remains cold during concentration and that filtration 
does not proceed to the point where all solvent is lost since 
this can lead to denaturation of biomacromolecules. 

Ultrafiltration is useful in three main types of applications. 
The most obvious is concentration of biomacromolecule so- 
lutions prior to some further experimentation. Since rapid 
concentration is usually achieved, this can greatly shorten 
the time required for processes such as protein purification. 
For example, if a 200 ml solution of dilute protein is con- 
centrated to 20 ml in an hour, then this 20 ml can be applied 
to a chromatography column ten times faster than would 
the unconcentrated solution. The inclusion of the concen- 
tration step therefore speeds up the overall process and often 
improves recovery of activity since many proteins lose bio- 
logical activity when dilute. 

A series of concentration and dilution steps can be used 
to swap macromolecules from one buffer to another. The 
identity/concentration/composition of the buffer can all be 
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Figure 7.15. Ultrafiltration. (a) Outline of apparatus used. Sample 
is placed in a chamber where it can be stirred continuously. Ni- 
trogen gas forces sample through the membrane which will have a 
characteristic M, cutoff (e.g. 10 kDa). Molecules smaller than the 
cutoff pass freely through the membrane (permeate) while larger 
molecules are retained (retenlate). (b) More detailed description of 
ultrafiltration. (1) Sample contains molecules of a range of masses 
(only two are shown for simplicity). (2) Molecules smaller than 
the cutoff pass freely through pores in the membrane while larger 
molecules cannot. (3) This achieves size-fractionation of the sample 
in that the retentate is enriched for larger mass molecules while the 
permeate is enriched for smaller molecules. 


changed in this way. If, for example, 200 ml of buffer A is 
concentrated to 20 ml, and then diluted to 200 ml with buffer 
B, reconcentrated and the process repeated 3—4 times, then 
the macromolecules will eventually be surrounded by buffer 
B. Again, this type of procedure is very useful where suc- 
cessive procedures require quite different buffer conditions. 

A third type of application is size fractionation by use of 
different membranes in successive ultrafiltration steps. For 
example, if a protein solution which passes through a 50 kDa 
cutoff membrane is then passed through a 10kDa mem- 
brane, a fraction rich in molecules of mass-range 10-50 kDa 
can be selected. This is a comparatively easy way to achieve 
a significant enrichment during protein or peptide purifi- 
cation or to decide if a particular biological activity is as- 
sociated with small, intermediate or very large molecules 
(Example 7.5). 
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Example 7.5 Use of ultrafiltration to form a size-excluded peptide bank 


Many small peptides have pharmacologically-interesting properties and are major targets for drug discovery. Techniques 
such as biological assay or orphan receptor screening may be used to identify fractions containing promising peptides. 
In this example, a collection of endogenous peptides comprising a peptide bank was selected from pig brain. 

The first step in this process was ultrafiltration through membranes with a 50 kDa cutoff. This was achieved by preparing 
181 of extract by homogenization, centrifugation and filtration. A pressure of 1 bar was maintained at a temperature of 
5°C and a feed-rate of 100 I/h. Collection of permeate was stopped when 431 had been collected and it was acidified 
to pH 3.0 to prevent bacterial growth. This allowed the removal of proteins, nucleic acids and cell debris and produced 
a heterogeneous mixture of peptides suitable for further fractionation. This was achieved using a combination of cation 
exchange and reversed phase chromatography. 

Reproduced from Seiler et al. (1999) Application of a peptide bank from porcine brain in isolation of regulatory 
peptides. J. Chromatog. A, 852, 273-283, with permission from Elsevier. 
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Another format for filtration (Figure 7.16) is useful where 
it is necessary to sterilize solutions which would not with- 
stand heat sterilization. This is called sterile filtration and 
involves passage of the solution through a 2 filter (i.e. a fil- 
ter with 2 u pores) into a sterile container. This filter retains 
bacteria, fungi and spores thus resulting in a sterile solution. 
A common application for sterile filtration is where it is 
necessary to add heat-sensitive growth factors or antibiotics 
to cell culture media. 

Small scale ultrafiltration may be performed in 
commercially-available centrifugal concentrators. An out- 
line design of the apparatus used for this is shown in 
Figure 7.17. The sample to be concentrated (approx. 0.1—1.5 
ml) is centrifuged through a filter which retains molecules 
larger than the M, cutoff for that filter. As with large-scale 


ultrafiltration, a range of membranes with differing M, cut- 
off values are available. Centrifugal force achieves the same 
effect as gas pressure in ultrafiltration by forcing the bulk 
solution through the filter. The filter cartridge is designed 
to retain a small minimum volume (~10 ul) which prevents 
macromolecules from losing all solvent and thus becom- 
ing denatured. As with large-scale ultrafiltration, centrifugal 
concentrators can also be used to desalt or otherwise alter 
the buffer composition of macromolecule solutions. 


7.3.2 Dialysis 


Dialysis involves equilibration of two solutions across a 
semipermeable membrane. If the volume of one solu- 
tion massively exceeds that of the other, it is possible to 
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Figure 7.16. Sterile filtration. Filtration of buffers and other liquids 
through 2u filter retains bacteria and facilitates sterilisation of the 
solution. 


convert the buffer conditions of the smaller solution to that 
of the larger by repeated changes of the larger solution. Thus, 
dialysis provides a means to desalt or otherwise change the 
buffer composition of solutions of biomolecules too large 
to pass through the membrane. In practice, commercially- 
available dialysis tubing with a M, cutoff of 10 kDa is used 


Centrifugation 
Buffer and molecules 


smaller than membrane 
cutoff 


Figure 7.17. Centrifugal concentrators. Sample (approximately 
0.1-1.5 ml) is placed in a tube which contains a permeable mem- 
brane at one end. This membrane has a defined cutoff value (e.g. 
3kDa; 30kDa). Centrifugation forces buffer through the mem- 
brane, retaining molecules larger than the cutoff within the sample 
tube. Repeated dilution and concentration can achieve rapid buffer 
exchange. 
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Figure 7.18. Principle of dialysis. A solution of macromolecules 
(e.g. protein or nucleic acids) is placed in a dialysis bag and this 
is tied at each end with a knot or clip, allowing some space for 
water uptake as a result of osmosis. This is placed in a large volume 
of buffer and stirred for approximately 4—6 h. Buffer is changed at 
least twice. As shown in the detail presented in dashed box (right), 
molecules larger than the M, cutoff for the membrane (usually 
10 kDa) are retained inside the bag while buffer components and 
other molecules smaller than the cutoff equilibrate between the 
inside and the outside. 


(Figure 7.18). One end of the tubing is tied into a knot 
or compressed with a clip, the macromolecule solution is 
placed inside the tubing and the other end of the tubing is 
tied in a knot or clipped, taking care to squeeze air from 
the tubing and to allow some space for extra water to enter 
the tubing by osmosis. The dialysis tubing is then placed in 
a much larger volume of a solution of desired composition 
and allowed to equilibrate for 4-6h. During this time, the 
buffer components of the small volume solution equilibrate 
with the larger volume and vice versa. The bulk solution is 
changed at least twice which means that the small volume 
solution eventually becomes identical with the large vol- 
ume solution. Macromolecules, however, are retained by the 
membrane which allows recovery of protein/nucleic acids 
in a solution of predetermined composition. 

This approach has the advantages that it is applicable 
to a range of sample volumes and allows gentle alteration 
of the composition of macromolecular solutions. It has the 
disadvantage that, since equilibration across the membranes 
used is slow, the process is quite time-consuming which can 
often lead to loss of biological activity. 

Various adaptations of the basic dialysis technique are 
possible. Electrodialysis involves speeding up the ex- 
change of charged buffer components by exposing the 
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Figure 7.19. Principle of electrodialysis. If electrodes are placed 
in the buffer solution and a voltage applied, ions migrate towards 
the electrode of opposite charge. As shown in the detail presented 
in dashed box (right), cations move towards the cathode while 
anions move towards the anode (arrows). Molecules larger than 
the membrane cutoff may migrate within the dialysis membrane 
but still cannot pass through its pores. This speeds up desalting of 
proteins and is widely used in industrial scale protein purification. 


macromolecule solution to an electric field (Figure 7.19). 
Charged molecules migrate out of the solution contained 
in the dialysis chamber while macromolecules are retained. 
A similar type of approach underlies some electroelution 
experimental formats (Chapter 5). Microdialysis involves 
small-scale dialysis in which a large number of samples can 
be dialyzed simultaneously. 

In addition to its use in desalting, dialysis can also be used 
to estimate binding constants for small ligands. This exper- 
iment is called equilibrium dialysis. A protein (or a macro- 
molecular species capable of binding, e.g. a membrane re- 
ceptor on a vesicle) is placed inside one of two equal-sized 
chambers separated by a dialysis membrane (Figure 7.20). 
The M, cutoff for the membrane should be smaller than the 
M, of the macromolecule and larger than the M, of the lig- 
and. Equilibration occurs across the membrane such that the 
free ligand concentration ([S;]) becomes the same in both 
chambers although some of the ligand is bound to the pro- 
tein (as the PS complex). If the total ligand concentration is 
known, the fraction of ligand bound to the protein ([PS] can 
be calculated by subtracting [S;] measured at equilibrium 
from the total ligand concentration ([S,]). The total protein 
concentration ([ P,]) is also known and subtracting [PS] from 
this gives us the concentration of free protein (i.e. with no 
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Figure 7.20. Equilibrium dialysis. The binding constant (Ks) of 
a protein (large circles) for a small ligand (small circles) may be 
determined by equilibrium dialysis. Two chambers are separated 
by a semi-permeable membrane which has a M, cutoff smaller 
than the protein but larger than the ligand. Free protein and ligand 
are denoted as [Pr] and [Sf], respectively, while the protein-ligand 
complex is denoted by [PS]. The total ligand concentration (= [Sf] 
+ [PS]) and the total protein concentration (= [Pf] + [PS]) are 
known at the beginning of the experiment. [Sf] may be determined 
by measuring in the chamber as shown (right). Therefore, [PS] and, 
hence, [Pf] can be determined. Ligands are usually labelled (e.g. 
with radioactivity) to facilitate quantitation. 


ligand bound, [P;]). We can determine Ks for the protein for 
this ligand from the relationship Ks = ([P;] - [S¢])/[PS]. 


7.3.3 Precipitation 


The procedures outlined in the two previous sections fa- 
cilitate changing the buffer conditions of a macromolecular 
solution by taking advantage of the fact that small molecules 
can pass through pores which are too small to allow passage 
of large biomacromolecules. It is also possible selectively to 
precipitate biomacromolecules and thus remove them from 
solution (Figure 7.21). Some precipitation procedures are 
biocompatible in that they retain the biological activity of 
the macromolecule allowing them to be redissolved in a 
small volume of a different buffer. Even where precipitation 
conditions are chemically severe and lead to loss of biologi- 
cal activity, they may nonetheless be useful in concentrating 
sample for further analysis (e.g. precipitation of protein for 
SDS PAGE; Chapter 5). All precipitation procedures affect 
interactions between the biomacromolecule and the buffer 
responsible for keeping it in solution. 

Proteins are readily precipitated by treatment with a 
high concentration of a salt such as ammonium sulfate 
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Figure 7.21. Precipitation methods useful with biomacromolecules. An overview of biocompatible (left) and non-biocompatible precipi- 
tation methods for proteins and DNA is shown. Precipitated material is collected by centrifugation. In the case of biocompatible methods, 
fully active DNA/protein is usually recovered. The pellet is normally washed followed by further centrifugation. 


[(NH4)2SO,] which interferes with hydrogen bonds between 
the protein surface and the solvation shell of water. This 
process is know as salting out and the precipitate may be 
collected by centrifugation and resolubilized in an alterna- 
tive aqueous buffer which usually results in recovery of fully 
bioactive protein. Since the volume of buffer used for protein 
solubilization is chosen by the experimenter, it is possible 
to concentrate protein by this method. It is important to un- 
derstand that a minimum protein concentration is required 
for ammonium sulfate precipitation. If the protein solution 
is too dilute, no precipitate is generated and it is necessary 
to concentrate the solution by some other method such as 
ultrafiltration before repeating the precipitation. 

A second means of biocompatible precipitation is im- 
munoprecipitation. This requires that antibodies to one or 
more antigens on the sample are available. Binding of anti- 
bodies to antigens in the protein solution forms a complex 
of low solubility called immunoprecipitates which can be 
collected by centrifugation. This is a useful method to pu- 
rify a single protein or group of structurally-related proteins 
from a crude mixture. 

If subsequent analysis or experimentation does not re- 
quire fully active protein, precipitation may be achieved with 
an organic solvent such as acetone. This alters the physical 


properties of the bulk solvent, particularly its dielectric con- 
stant and polarity. These effects completely disrupt protein 
structure, leading to denaturation and precipitation of the 
protein. Acetone precipitation has the advantage that it is 
effective at much lower protein concentrations than ammo- 
nium sulfate. The protein solution is mixed with an excess 
of acetone which has been cooled to —20°C and precip- 
itated protein is collected by centrifugation. Precipitation 
with a concentrated acid such as trichloroacetic acid (TCA) 
causes extremely low pH resulting in precipitation. The min- 
imum protein concentration required for TCA precipitation 
is higher than that for acetone precipitation. To minimize 
the risk of acid hydrolysis of peptide bonds, the experiment 
is performed on ice and care should be taken to remove 
residual acid by washing the protein pellet with acetone. 

A similar approach to precipitation of nucleic acids in- 
volves precipitation with ethanol. DNA and RNA precipi- 
tate readily from ethanol solutions allowing their recovery 
by centrifugation. However, in handling RNA, it should be 
borne in mind that ribonucleases are ubiquitous on labo- 
ratory plastic- and glassware and fingertips. Test-tubes and 
other materials used in the handling of RNA should there- 
fore be baked in diethyl pyrocarbonate before use and gloves 
should be worn at all stages. 
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Figure 7.22. Lyophilisation. A frozen sample is placed in a refrig- 
erated chamber which is then evacuated. Water, organic solvents 
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and volatile buffer components such as ammonia and carbon diox- 
ide are removed by a process of sublimation. Solute (e.g. proteins 
or peptides) are collected as a dried and usually denatured residue. 
Preweighing of sample vial allows estimation of yield by weighing 
the vial after drying. This process is also referred to as ‘freeze- 
drying’. 


A final method for recovery of biomacromolecules from 
solution is selective removal of the solvent by lyophilization 
or freeze-drying. This is facilitated by first transferring the 
biomacromolecules into a buffer containing volatile com- 
ponents such as ammonium bicarbonate using ultrafiltration 
or dialysis (Figure 7.22). The sample is frozen to —198 °C 
by dipping it in liquid nitrogen. It is then placed in an evac- 
uated chamber in which sublimation can occur; a molecule 
passes from the solid phase directly to the gas phase with- 
out passage through the liquid phase. Water and ammonium 
bicarbonate are readily removed from the frozen sample 
in the form of water vapour, NH; and CO, leaving the 
macromolecular component of the sample as a fine solid. 
The lyophilized sample may lack biological activity as a 
result of this process involving as it does removal of sol- 
vent and exposure to atmospheric oxygen. However, this is 
a convenient means of preparing large volume samples for 
subsequent chemical analysis. 


7.4 FLOW CYTOMETRY 


Flow cytometry is a technique which facilitates analysis of 
cell populations suspended in a fluid as they pass through 
a narrow orifice in single file at a rate of many thousands 
of cells per second. Optical and electronic measurements 
can be performed on individual cells as they pass through a 
detector which allows us to fractionate the population into 
discrete subpopulations and, in this way, to profile the cell 
population. This can be useful in analysis of complex cell 
mixtures such as blood or in assessing changes in a pop- 
ulation of cells as a result of experimental manipulation. 


In addition to analytical applications a preparative proce- 
dure known as cell sorting is possible in which a specific 
subpopulation of cells is purified from the cell population 
on the basis of measurable biochemical attributes such as 
expression of a particular receptor or biomacromolecule. In 
principle, any biochemical association with a cell or particle 
is amenable to analysis by flow cytometry provided a mea- 
surable physical property or fluorescent tracer is available 
which reacts stoichiometrically and specifically with a tar- 
get such as a receptor, DNA, RNA or protein. In this section 
we will look at the experimental design and application of 
flow cytometry to biochemical analysis. 


7.4.1 Flow Cytometer Design 


An outline design of a flow cytometer is shown in Figure 
7.23. The sample cell population is directed along a narrow 
quartz tube leading to an injector in the core stream. This 
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Figure 7.23. Outline design of a flow cytometer. Cells are carried 
in the core flow which directs them through the detector zone in 
single file. A transverse view is shown in the dashed box on left. The 
detector is an area where the beam froma laser is focused. Low angle 
scattered (forward-scattered) light is detected at detector 1, while 
right angle scattered and fluorescent light is detected at detector 2. 
Selective filters allow detection of specific emission wavelengths. 
The detail on right shows detection of a fluorescently-labelled cell 
as it passes through the detector. 


is surrounded coaxially by a tube containing a stream of 
buffered medium called the sheath stream. The purpose of 
this arrangement is to concentrate cells in the inner tube 
(where measurements will be made) while avoiding tur- 
bulent or parabolic flow. The injector narrows to a small 
internal diameter as shown such that cells are compelled to 
pass in single file through a flow cell which places them in 
the centre of an optical detection system. Typical internal 
diameters for the tube through which the core stream flows 
are 3 mm at the top narrowing to between 75 and 200 um at 
the injector. This tapering is crucial to the design of a flow 
cytometer since it avoids the parabolic flow which arises 
in aqueous solution as a result of viscosity (Section 7.1.1; 
Figure 7.1). 

The detector consists of an elliptical laser-beam (e.g. an 
argon laser; Chapter 6) that interacts with the cells scattering 
light which may be detected at a low angle (e.g. 0.5—10°) as 
forward scattered light and also at a high angle (e.g. 90°) 
as right angle scattered light. Low angle scattered light is 
usually taken as a general measure of cell size while light 
scattered at a right angle is a measure of the refractive index 
of the cell, in turn reflecting the number of intracellular gran- 
ules it contains. Fluorescence may also be detected at 90° 
and can be used to quantify particular biochemical events 
such as receptor binding or binding at specific DNA or RNA 
sequences (see below). The usual arrangement of the optical 
detection system is orthogonal in that the laser is arranged 
at 90° to the axis of flow while the detector is at 90° to the 
laser. 

This experimental setup allows rapid analysis of over 
5000 cells per second. A large number of parameters can 
be measured for each cell and the data electronically stored. 
Measurements can be analysed and reanalysed repetitively 
to identify progressively more highly defined subpopula- 
tions of cells within the overall population. 

Cells are presented to flow cytometry as disaggregated 
suspensions which may need to be released from the extra- 
cellular matrix by enzymatic or physical methods. It is also 
possible to analyse intact nuclei obtained from cells pre- 
pared as paraffin-embedded tissue sections by a combination 
of mechanical disruption with dewaxing, hydration and ex- 
posure to proteases. This latter approach facilitates analysis 
of nuclear DNA by flow cytometry to identify whether the 
cells from which the nuclei originated were haploid, diploid 
or polyploid as well as the stage of the cell cycle reached by 
the original cells. 


7.4.2 Cell Sorting 


The flow cytometer can be used to sort cells as shown in 
Figure 7.24. This involves selecting cells which the detec- 
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Figure 7.24. Cell sorting in flow cytometry. (a) The nozzle of the 
flow cytometer is vibrated axially (dashed ellipse) at a particular 
frequency by the effect of a drop-forming signal on a piezoelectric 
crystal. 40 000 vibrations per second gives 40 000 drops per second, 
(b) Labelled cells are detected by detectors 1 and 2 which pass 
a signal to a charge generator. This sends a drop-charging signal 
which confers a change (positive or negative) on the drop containing 
the labelled cell. In practice, drops in front of and behind that 
containing the detected cell are also usually charged as well, (c) On 
leaving the nozzle, charged drops arc deflected between charged 
plates and collected. Note that a second population of cells in the 
same sample could be labelled with a negative charge and collected 
in a second vial. 


tor indicates may be of interest (based on fluorescence or 
granularicity) for collection as droplets on which an elec- 
tric charge is conferred allowing their later separation in an 
electric field. This is achieved by vibrating the flow cell or 
nozzle using a piezoelectric crystal (similar to that used to 
select plane polarized light; Section 3.5.4) at a frequency 
selected to ensure uniform droplet size and a precise time- 
interval between detection and droplet formation. Detection 


288 PHYSICAL BIOCHEMISTRY 


of a cell of interest by detectors 1 and 2 triggers a charge- 
generating signal to the core flow. This applies an electric 
potential of ~100 V to the fluid containing the cell while 
still inside the flow cell or nozzle and, when the fluid forms 
a droplet, it will carry a corresponding electric charge. Two 
cell populations can be sorted from a single sample by ap- 
plying a positive charge to one and a negative charge to the 
other. 

The droplets leave the flow cell/nozzle and pass be- 
tween two electrically-charged plates which deflect them 
according to their electric charge. Droplets containing cells 
which are not of interest (which will not be detected 
by detectors) pass undeflected through the electrically- 
charged plates. In this way, subpopulations of cells can 
be purified from the overall population and used for fur- 
ther investigations. In practice, because of possible inac- 
curacy between detection and sorting, droplets in front of 
and behind the droplet of interest are also charged and 
collected. 


7.4.3 Detection Strategies in Flow Cytometry 


To obtain useful information from flow cytometry, it is es- 
sential that cells can be ‘seen’ by the detector. Detection 
can take advantage of intrinsic properties of the cell or 
exploit interactions between cells and extrinsic reagents. 
Selective information is obtained by controlling the wave- 
length and other characteristics of the laser light incident 
on the cells combined with variation in the angle of detec- 
tion and use of appropriate filters. Modern flow cytome- 
ters have the potential for simultaneous detection of scat- 
tered light at several wavelengths and angles in individual 
channels. 

Intrinsic properties useful in flow cytometry include re- 
fractive index and absorption/fluorescence properties of 
biomolecules within cells. As mentioned above, refractive 
index measurements quantify granularicity within the cells. 
Different subpopulations of cells would be expected to have 
altered granularicity as a consequence of changes in the rela- 
tive amount and activity of cell organelles. Subtle differences 
between cell subpopulations can therefore be detected in this 
way. Absorbance measurements may be used to quantify 
chromophores such as nucleic acids and haem-containing 
proteins essentially as described in Chapter 3. Useful intrin- 
sic fluors include pyridine and flavin nucleotides and mea- 
surement of these, for example, may be used to detect differ- 
ences in redox status between different cell subpopulations 
by measuring NAD: NADH + H* ratios. Specific groups of 
naturally-occurring chromophores such as porphyrins and 
chlorophylls may also be useful in the study of particular cell 


types. 


Because of its sensitivity, the most versatile detection 
method in flow cytometry is extrinsic fluorescence (Sec- 
tion 3.4). A number of extrinsic fluors are available with 
the same excitation wavelength but distinct emission wave- 
lengths. These may therefore be detected independently of 
each other by use of appropriate filters in the detection sys- 
tem. This means that 3—4 individual fluorescent labels may 
be detected in a single population of cells, allowing fraction- 
ation into a number of subpopulations. Possible strategies 
for introducing fluors include their accumulation in or on 
cells, covalent labeling of antigens or antibodies or covalent 
labeling of oligonucleotides (Example 7.6). This allows si- 
multaneous quantification of expression of several antigens 
ina single cell population, quantification of changes in such 
antigen profiles and detection of individual DNA sequences 
by in situ hybridization with labeled oligonucleotides. Stud- 
ies of nuclei allow quantification of chromosomes such as 
karyotyping, determination of the stage of the cell cycle 
and measurement of nuclear proteins such as proliferation 
factors. 


7.4.4 Parameters Measurable by Flow Cytometry 


Flow cytometry provides a generally-useful strategy for an- 
alyzing a wide range of cell parameters. Physical attributes 
of cells such as their volume, diameter, shape and surface 
area can all be measured and quantified even in complex cell 
populations. This may involve exposing the cells during flow 
to light beams arranged in several directions simultaneously 
and/or analyzing patterns of light scattering at a number of 
angles. These patterns can be analysed by microcomputer 
allowing deduction of physical parameters. Moreover, iso- 
lation of cells with predefined physical characteristics from 
complex mixtures is facilitated by flow cytometry using cell 
sorting. Measurement of biochemical attributes is only lim- 
ited by the specificity and sensitivity of the detection system 
used as outlined above in Sections 7.4.1 and 7.4.3. Combi- 
nation of physical and biochemical attributes allows quan- 
tification and preparation of cell subpopulations defined by 
a wide range of parameters. 

The literature cited in the bibliography at the end 
of this chapter (and papers cited therein) describes ap- 
plications of flow cytometry to measurement of mem- 
brane fluidity/permeability, intracellular pH, intracellular 
ion flux, mitochondrial activity, cell cycle status, expres- 
sion of specific protein/DNA/RNA species, study of intra/ 
extracellular receptors, detection/sorting of cancer/blood/ 
sperm/protozoan/plant/plankton/fungi/bacterial cells, cell 
differentiation and chromatin structure making it one of 
the most versatile and useful tools in the study of cell 
biology. 
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Example 7.6 Estimation of apoptosis by flow cytometry 


Programmed cell death, apoptosis, is associated with fragmentation of DNA into nucleosome-sized fragments (146 bp). 
These can be qualitatively detected as ‘ladders’ in agarose gel electrophoresis (Chapter 5). However, it is sometimes 
useful to quantify apoptosis in cell populations and flow cytometry is particularly useful for this purpose. 

A popular method is the terminal deoxynucleotidyl mediated dUTP nick end-labelling (TUNEL) assay. This involves 
permeabilizing cells with detergent followed by incubation with fluorescently-labeled dUTP (e.g. FITC-dUTP) and 
an enzyme called terminal transferase. This is a DNA polymerase which does not require a template. It catalyzes the 
addition of FITC-UMP units to the 3’ end of DNA fragments within the cell. The more nucleosomes in a given cell 
population, the stronger the fluorescence signal associated with the cells during flow cytometry. The percentage of 
apoptotic cells in the population can be determined using fluorescence activated cell sorting (FACS) analysis which 
quantifies DNA-associated FITC. 
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The TUNEL assay 
An essential step in the biogenesis of red blood cells is binding of the growth factor erythropoietin (EPO) to the 
erythropoietin receptor. This has effects on intracellular signal transducer/activator of transcription (STAT) proteins such 
as STAT5 which is rapidly activated. In particular, EPO rescues committed erythroid progenitors from the pathway 
leading to apoptosis, allowing them to proceed to form mature erythrocytes. Mice lacking both genetic loci for STATS 
(i.e. STAT5a~/~ 5b~/-) provide a useful model system to study this phenomenon. 

Foetuses of STATSa~/~ 5b~/~ are found to be far more anaemic than wild-type animals. An explanation for this is 
provided by TUNEL assay of erythrocyte progenitor liver cells in the presence or absence of EPO as shown below: 
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M (Continued) 


FACS analysis of wild-type and STATSa~~ 5b~/ liver progenitor cells. 
From these data it is clear that STATSa~‘~ 5b~/~ cells are much less responsive to the antiapoptotic effect of EPO than 
are wild-type. STAT5a~/~ 5b~"~ cells are approximately 2.5-fold more apoptotic than wild-type at the time of harvesting. 
When cultured for 24h in vitro in the absence of EPO they are almost twofold more apoptotic (that is 66% versus 39%). 
Even in the presence of 0.05 U/ml EPO, 36% of STATSa~/~ 5b~/“cells remain apoptotic compared to 8% of wild-type. 
These observations account for the fetal anaemia observed in STATSa~/~ 5b~/~ animals. 

Reproduced from Socolovsky et al., (1999) Fetal Anemia and Apoptosis of Red Cell Progenitors in Stat5a-/-Sb-/-mice: 
A direct Role for Stat5 in BCI-XL Induction. Cell 98, 181-191, with permission of Cell Press. 
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Chapter 8 


Biocalorimetry 


Objectives 


After completing this chapter you should be familiar with: 


¢ The physical meaning of the main thermodynamic parameters. 

° The physical basis of isothermal titration calorimetry. 

° The physical basis of differential scanning calorimetry. 

e Determination of thermodynamic parameters by noncalorimetric means. 


Heat is one the most important physical quantities in bi- 
ology. Biological systems and processes are highly heat- 
sensitive and will only function in relatively narrow tem- 
perature ranges. These ranges are largely defined by two 
factors: 


1. The exponential effect of temperature on chemical pro- 
cesses described by the Arrhenius equation, 


k = Ae Fal BT (8.1) 


where k is the rate constant, E, is activation energy (Sec- 
tion 8.1 below), R is the Universal Gas Constant (Ap- 
pendix 1), T is temperature and A is a constant called the 
frequency factor which reflects the total number of col- 
lisions of reactant molecules per unit time in the correct 
relative orientation for product formation. This means 
that the rates of most chemical reactions increase expo- 
nentially with increasing temperature. 

2. The sensitivity of biopolymer structure to heat destruc- 
tion. An example of the latter is heat-mediated denatura- 
tion of proteins and DNA. A good example of the con- 
sequence of these two factors is that enzyme-mediated 
reactions tail off at higher temperatures due to loss of 
structure (Figure 8.1). 


Heat is also an important aspect to the full description of 
chemical reactions because it helps us to understand energy 
flow within such reactions. The First Law of Thermodynam- 
ics states that energy cannot be created or destroyed, it can 
only change form during a chemical process. Heat generated 
or consumed can therefore, in principle, be directly mea- 
sured. Such an apparatus is called a calorimeter. The usual 
experimental approach is to maintain the reaction at a single 
temperature by the addition or removal of heat. The amount 


of heat requiring to be added or subtracted is automatically 
measured. Calorimetric measurements of biochemical reac- 
tions such as protein-DNA binding, protein-protein com- 
plex formation and protein unfolding is called biocalorime- 
try. In this chapter we will review the thermodynamics of 
biochemical systems and describe some examples of the 
main biocalorimetric experimental approaches. 


8.1 THE MAIN THERMODYNAMIC 
PARAMETERS 


8.1.1 Activation Energy of Reactions 


Heat is a form of energy. During a chemical reaction, bonds 
need to be broken (which releases energy) or to be made 
(which requires energy). For example, in formation of a 
peptide bond between two molecules of glycine: 


NH, — CH, — COOH + NH, — CH, — COOH — 


NH» — CH: — CO — NH — CH; — COOH + H20 68.2) 


the CO-OH bond of one amino acid and the NH-H bond 
of the other must be broken while a CO—NH peptide bond 
must be formed. If the process of bond-breaking requires less 
energy than that of bond-making, then the reaction is said to 
be exothermic. If the opposite is the case, the reaction is said 
to be endothermic. We can extend this description to other 
important biochemical processes such as protein folding and 
complex formation which may not involve formal changes 
in bonding structure. 

The collision theory of chemical reaction envisages a 
population of reactants with a wide range of chemical en- 
ergy values. Reaction only occurs between reactants when 
they collide and undergo bond breakage/formation to form 
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Figure 8.1. Effect of temperature on an enzyme-catalyzed reac- 
tion. Temperature increases the rates of chemical reactions includ- 
ing rates of enzyme-catalyzed reactions because it has an exponen- 
tial effect due to the Arrhenius relationship. Structural denaturation 
also increases with temperature though which leads to inactivation 
of enzyme. The temperature profile of enzymes results from these 
two processes. 
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the product. Only molecules with energy greater than the 
activation energy (E,) can successfully undergo reaction. 
This is characteristic for the reaction and associated con- 
ditions such as temperature and pH. If the molecules in 
a collision have less energy than E,, they simply bounce 
off each other without reacting. E, effectively provides a 
barrier to reaction called the potential energy barrier which 
can only be overcome by a proportion of reactant molecules. 
The number of molecules with energy above E,(n) obeys 
the Boltzmann distribution, 


n = noe ERT (8.3) 


where no is the total number of molecules in the popula- 
tion. This was used to formulate the Arrhenius equation 
shown above (Equation (8.1)). If we determine rate con- 
stants at a number of temperature values, we can use the 
Arrhenius equation to determine Ea from a plot of logk 
versus 1000/T (Figure 8.2) because the slope of this line 
has a value E,,/2.303R. The relationship between the en- 
ergy of reactants and E, is shown in Figure 8.3. Exothermic 
reactions release heat while endothermic reactions absorb 
heat and both types of reaction occur in Biochemistry. The 
highest-energy and least-stable species formed in the reac- 
tion pathway is called the transition state. Some reactions 
may have several activation energy barriers and there may 
be several transition states in such pathways. 
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Figure 8.2. Arrhenius plot of a chemical reaction. (a) Chemical reactions depend exponentially on temperature. (b) The activation energy 


(E,) can be calculated from the slope of an Arrhenius plot. 
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(b) 


Product 


Reactant 


Reaction coordinate 


Figure 8.3. Potential energy diagrams of reactions. Potential energy diagrams of reactions are shown. The activation energy barrier is 
denoted by E, while the enthalpy of the reaction is shown by AH. (a) An exothermic reaction results in positive AH. (b) An endothermic 


reaction results in negative AH. 


8.1.2 Enthalpy 


The energy produced or absorbed by a reaction is called 
the change in enthalpy, AH. In endothermic reactions the 
enthalpy value, H, of the product is higher than that of 
the reactants while in exothermic reactions the opposite is 
the case. Enthalpy is defined as 


H=U+pV (8.4) 


where U is the internal energy of the molecule and V its 
volume at pressure, p. 

In a chemical reaction occurring in a thermodynamically 
closed system (i.e. a system to which energy cannot be 
added or lost during the reaction), a change in both the 
internal energy, AU and the volume, AV, can occur as a 
result of the reaction and this is associated with a quantity 
of heat, q, either entering or leaving the system: 


AU =q — (pAV) (8.5) 
The enthalpy change, AH, is defined as 
AH = AU + pAV + VAp (8.6) 


where Ap and AV are the pressure and volume changes 
during the reaction, respectively. Combining Equations (8.5) 
and (8.6), 


AH =q+(VAp) (8.7) 
which means that, at constant pressure (Ap = 0): 
AH =q (8.8) 


AH values of this type can therefore be directly determined 
in an isobaric calorimeter by direct measurement of q. 


Enthalpy is an inherent property of every molecule un- 
der specific conditions of temperature, pressure and phase. 
Change of phase from solid to liquid to gas result in different 
inherent enthalpy values for a given molecule. Standardiza- 
tion and comparison are achieved by reporting data as stan- 
dard enthalpies of reaction (A H°) which are defined as the 
enthalpy change taking place when molar quantities react 
at 25 °C in a pressure of one atmosphere (1.013 x 10° Pa). 
Examples of this include the standard enthalpy of solvation 
(AH, where dipoles in the sample attract counter-ions 
in the solvent), solution (A Hg}, where one mole of sam- 
ple is completely dissolved) and formation (A Hĝ,m» when 
one mole of sample is formed from its constituent elements 
under standard conditions). 


8.1.3 Entropy 


Entropy (S) is another inherent thermodynamic value of all 
molecules including biomolecules. It may be conveniently 
thought of as a measure of disorder or randomness in the 
molecule or process. The Second Law of Thermodynam- 
ics states that a system will always undergo spontaneous 
change in such a way as to increase entropy. This is es- 
pecially relevant to biomacromolecules and to biological 
systems generally, since these are unusually highly ordered 
and therefore constantly experience a natural tendency to 
adopt more random forms. 

A completely crystalline solid with no vibrational move- 
ment would have zero entropy. As temperature is increased, 
so does the entropy level. Going from the solid to the liquid 
phase results in a large entropy change as does going from 
the liquid to the gaseous phase. Similarly, any process re- 
sulting in an increase in disorder such as dissolving a solid 
in a liquid solvent, increasing the number of molecules in a 
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chemical solution or unfolding a protein results in entropy 
increases. 

Absorption of heat by a chemical system results in an 
increase in disorder and hence an increase in entropy (AS). 
This value is the ratio of the heat absorbed to the absolute 
temperature: 


AS = (8.9) 


ps 


If T remains constant, such a process is said to be isother- 
mal. A consequence of the Second law of Thermodynamics 
is that A Sota: for a spontaneous process must be greater than 
0 where A Siotai is the sum of the AS for the specific system 
(e.g. the chemical reaction or process) plus the AS of its 
surroundings: 

A Sotal > ASsystem + A Ssurroundings (8. 10) 

If the reaction is exothermic, the system loses enthalpy 
to its surroundings. If the surroundings are sufficiently large 


that this does not result in an increase in temperature (which 
is usually the case in living cells), then 


AH 
ASsystem —-— >0 (8.11) 
T 
or 
TAS- AH>0 (8.12) 


Thus, spontaneous reactions will occur generally when 
AH < TAS. Endothermic reactions will occur provided 
the AS value is sufficiently large to make TAS greater than 
AH which is likely to be the case at higher temperatures. 


8.1.4 Free Energy 


Just as all molecules have intrinsic enthalpy and entropy val- 
ues at given conditions of temperature, pressure and phase, 
so too they all have different amounts of a specific type of 
energy defined as that energy available to do work. This is 
called the free energy and, in honour of the American sci- 
entist Willard Gibbs (1839-1903), denoted as G and some- 
times referred to as the Gibbs free energy. Spontaneous 
chemical reactions always result in decrease in G, that is a 
negative AG which, combined with Equation (8.12), gives 
us. 
AG = AH —TAS <0 (8.13) 
This reflects the fact that only a proportion of free energy 
is actually available to do work as some of it is swallowed 
up in entropy changes. 


8.2 ISOTHERMAL TITRATION 
CALORIMETRY 


Biocalorimetry allows direct measurement of enthalpy 
changes associated with chemical reactions and processes. 
Two main types of calorimetry are in widespread use nowa- 
days: Isothermal titration calorimetry (ITC) and differential 
scanning calorimetry (DSC; Section 8.3). ITC may be used 
directly to measure AH changes associated with processes 
such as ligand binding and complex formation. DSC is ap- 
propriate for the study of heat-initiated phase changes in 
biopolymers and gives measurements of AH values as well 
as other thermodynamic parameters. 


8.2.1 Design of an Isothermal Titration 
Calorimetry Experiment 


An outline design of an ITC calorimeter is shown in Figure 
8.4. This system is useful in the protein concentration range 
of mg/ml. A typical experiment involves measurement of 
heat change as a function of addition of small quantities of 
a reagent to the calorimeter cell containing other compo- 
nents of the system under investigation. For example, this 
reagent could be a protein ligand or substrate/inhibitor of 
an enzyme. At the beginning of the experiment, there is 
a large excess of protein compared to ligand. This means 
that AH values associated with each aliquot can be in- 
dividually measured. Initially, these values are large but, 
as aliquots are progressively added, eventually decrease to 
values similar to the AH of dilution of ligand into the solu- 
tion in the calorimeter cell (Figure 8.5). The AH measured 
is the total enthalpy change which includes heat associ- 
ated with processes such as formation of noncovalent bonds 


Syringe for 
injecting protein 
ligand 


Reference cell 
(buffer only) 


Sample cell 
(containing protein) 


Figure 8.4. Outline design of an isothermal titration calorimeter. 
Addition of a substance such as a protein ligand into the sample 
cell causes a release or absorption of a small amount of heat. This 
is detected and analysed electronically. Several such injections are 
made during a typical experiment. 
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Figure 8.5. A typical ITC experiment. Small aliquots of a drug 
capable of interacting with DNA are sequentially injected into the 
calorimeter sample cell which contains DNA. Aliquots of heat re- 
sulting from each injection are accurately measured (upper panel). 
After subtracting the heat of solution for each injection (inverted 
spikes in upper panel), a binding isotherm may be constructed for 
this process (square points in lower panel). Reprinted from Ladbury 
and Chowdhry (1998). Biocalorimetry: Applications of calorimetry 
in the Biological Sciences. Copyright John Wiley & Sons Limited. 


between interacting molecules and with other equilibria in 
the system such as conformational changes, ionization of 
polar groups (e.g. deprotonation) and changes due to in- 
teractions with solvent. Frequently, the results are reported 
as ‘apparent A H’ and compared to other experiments per- 
formed under identical conditions to allow identification of 
specific effects. This is an important point because such 
measurements are readily comparable to DSC data (Section 
8.3) which can distinguish binding effects more precisely 
from related conformational/solvent effects. 


8.2.2 ITC in Binding Experiments 


ITC provides a useful method for studying binding processes 
such as those involving a protein (P) and a ligand (L): 
P+L< PL (8.14) 


It allows estimation of both the binding constant (Kg) 


= PU (8.15) 
e PILI l 
and of the dissociation constant (Kp); 
ga (8.16) 


[PL] 
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Kp is related to AG by; 


AG = —RT In Kp (8.17) 


Because the AH for the binding interaction decreases 
as the system approaches equilibrium, it is a sensitive probe 
for the extent of the binding interaction. Subtraction of com- 
ponents such as heats of dilution gives a binding isotherm 
which may be used to calculate Kp or Kg (Figure 8.5). It 
is possible for binding of two individual ligands to a single 
protein to have similar Kp values (and consequently similar 
AGs; Equation (8.17)) but quite different AH and AS val- 
ues. ITC allows direct determination of AH and therefore 
helps confirm detailed differences in binding thermodynam- 
ics (Example 8.1). 


8.2.3 Changes in Heat Capacity Determined by 
Isothermal Titration Calorimetry 


The heat capacity (also known as the specific heat) of a 
material is a measure of its ability to absorb heat. Itis defined 
as Cp (the subscript denotes constant pressure) the quantity 
of heat which will change the temperature of exactly 1g 
of a substance by 1°C. For example, the specific heat of 
water is 1 cal/g°C while that of ethanol is 0.58 cal/g °C. 
This means that if identical weights of water and ethanol 
absorbed the same amount of energy, the temperature of 
ethanol would rise significantly higher than that of water. For 
most reactions involving low molecular mass reagents, the 
values for C, of reactants and products are not significantly 
different. 

However, for large molecules such as proteins, there may 
be significant changes in C, in processes involving large 
biomacromolecules. Water associated with the surfaces of 
such biopolymers behaves differently from bulk water. This 
is even truer of processes resulting in the exposure of hy- 
drophobic regions such as protein unfolding which exposes 
hydrophobic parts of the protein chain to solvent water 
(Chapter 6). These molecules are held relatively rigid by 
hydrophobic interaction with the protein. When the temper- 
ature is raised, this rigidity is considerably reduced resulting 
in a large increase in C, for the denatured state versus the 
native state. A converse effect is important during complex 
formation as considerable parts of a protein surface may 
be removed from direct contact with solvent as a result of 
complex formation resulting in large decreases in Cp. In 
fact, there is a direct and experimentally useful correlation 
between AC, and surface area removed from solvent during 
complex formation. 

ITC can be used to determine changes in heat capacity as 
temperature is increased since the instrument can be oper- 
ated in the temperature range 0-60 °C. The variation in Cp, 
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Example 8.1 Isothermal Titration Calorimetry in Study of Oligomerization of Vancomycin 


Vancomycin is a peptide antibiotic which is thought to function partly by oligomerization into dimers and larger polymers. 
This was revealed by the construction of ligand binding curves at a range of concentrations in the presence of presumed 
target molecules such as a small peptide as shown below in the panel on the right. 
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Source: Isothermal titration curves of 0.12 nM, 0.369 mM and 1.11 mM vancomycin in the presence of a target peptide (left) were used to plot binding 


curves (right). Reprinted from Ladbury and Chowdhry (1998). Biocalorimetry: Applications of calorimetry in the Biological Sciences. Copyright John 
Wiley & Sons Limited 


These ligand binding curves were fitted to a simple dimerization model. From these measurements, the AH and Kg 
for dimerization were calculated and it was shown that this situation could be distinguished from oligomerization by ITC 
measurements alone. 

From Cooper, A. (1998). Microcalorimetry of Protein-Protein Interactions in, Labbury, J.E. and Chowdhry, B.Z. 
(eds.) Biocalorimetry Applications of Calorimetry in the Biological Sciences. John Wiley & Sons Ltd, reproduced with 
permission. 


enthalpy and entropy at two different temperatures, T; and 
T is given by the following equation: 


AC = AHr, — AHr, 
T -T 


AS DS, 
In (74/T;) 


(8.18) 


Plotting AH versus temperature gives a slope corresponding 
to AC, (Figure 8.6). 

Measurement of AC, by ITC therefore provides a gen- 
erally useful method for probing structural effects during 
processes involving biopolymers such as complex forma- 
tion, protein folding and DNA duplex formation (Exam- 
ple 8.2). The literature cited in the Bibliography at the end 
of this chapter gives examples of experimental applications 
of ITC. 


Slope = AC, 
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Figure 8.6. Determination of specific heat from enthalpy data. A 
plot of AH versus T is a straight line with slope of ACp. 
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Example 8.2 Isothermal Titration Calorimetry in Study of DNA Duplex Formation 


Since formation of duplexes of DNA from complementary strands depends on a number of factors such as hydrogen 
bonding and base stacking, it is a highly temperature-sensitive process. Duplex formation was studied in an experimental 
system involving complementary oligonucleotides which permitted investigation of a range of experimental variables 
such as salt concentration. The experiment involved injecting a small amount of one oligonucleotide into a bulk solution 
of the complementary oligonucleotide followed by measurement of the AH of duplex formation. The results shown 
below were obtained at (a) 292.8 K and (b) 312.4 K. Binding curves (bottom panels) were constructed from the ITC data 


(upper panels). 
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M (Continued) 


Source: Plot of AH of duplex formation determined by ITC versus T 


associate to form duplex helices are coupled processes. 


Source: Comparison of ITC profiles for duplex DNA formation at two different temperatures 


AH values of —74.4 and — 102.5 kcal/mol were determined for (a) and (b). A H values were plotted against temperature 
(below) and the slope of this line allowed calculation of AC, of —1.3 kcal K~! mol duplex™!. 


These data were combined with DSC studies to propose that formation of single-strand DNA helices which then 


Reprinted with permission from Holbrook et al., Enthalpy and Heat Capacity Changes for Formation of an Oligomeric 
DNA Duplex: Interpretation in Terms of Coupled Processes of Information and Association of Single-stranded Helices. 
Biochemistry, 38, 8409-8422, © (1999) American Chemical Society. 


8.3 DIFFERENTIAL SCANNING 
CALORIMETRY 


DSC is the second major type of biocalorimetry experiment 
and involves measurement of changes in C, as a function of 
change in temperature. It is especially useful in determin- 
ing thermodynamic parameters of phenomena initiated by 
an increase or a decrease in temperature. The effects of dif- 
ferent experimental conditions such as pH, ionic strength, 
solvent identity, presence/absence of ligands on these pa- 
rameters can be studied by DSC. The technique is especially 
sensitive to phase changes in biopolymers arising during 
phenomena such as protein-protein binding, protein unfold- 
ing and ligand binding. Because very small AC, values 
are usually measured, it is necessary to compare a solu- 
tion of the sample to a reference solution containing only 
buffer. Moreover, repeated scans need to be made over a 
period of time to build up sufficient data for accurate Cp 
determination. 


8.3.1 Outline Design of a Differential Scanning 
Calorimetry Experiment 


An outline design of the experimental apparatus used is 
shown in Figure 8.7. DSC involves measuring tiny differ- 
ences in heat (hence differential scanning calorimetry) be- 
tween sample solutions (e.g. 2 ml of a 1-5 mg/ml protein) 
in the sample cell compared to reference solutions (e.g. 
buffer) in the reference cell. These two solutions are elec- 
trically heated (up-scans) or cooled (down-scans) and the 
additional current required to heat/cool the protein solu- 
tion is recorded. The solutions are then cooled/heated and 
the experiment repeated several times. The scans (hence 
differential scanning calorimetry) are summed to give an 
accurate measure of enthalpies of processes such as pro- 
tein unfolding, ligand binding or complex formation. Most 
DSC calorimeters operate in the temperature-range 1-98 °C 
although some operate over a wider range than this. It is 


usual to perform the experiment over as wide a concentra- 
tion range as possible, at a variety of scan-rates (i.e. °K/h) 
and under a variety of experimental conditions such as pH, 
ion strength, and so on A typical DSC signal is shown in 
Figure 8.8. This results from subtraction of a significant 
baseline as shown in Figure 8.9. The area under the curve 
is the AH and the melting temperature (Tm) and AC, are 
estimated as shown. These thermodynamic parameters ide- 
ally should show no dependence on scan rate or sample 
concentration. 
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Figure 8.7. Outline design of a differential scanning calorimeter. 
The outline design of a Microcal MC-2 calorimetric unit is shown. 
Reprinted from Ladbury and Chowdhry (1998). Biocalorimetry: 
Applications of Calorimetry in the Biological Sciences. Copyright 
John Wiley & Sons Limited. 
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Figure 8.8. DSC of ubiquitin. A typical DSC result for the thermal 
denaturation of ubiquitin (5 mg/ml) is shown. The melting temper- 
ature is denoted by Tm while the change in specific heat (A Cp) is 
calculated as shown Reprinted from Ladbury and Chowdhry (1998). 
Biocalorimetry: Applications of calorimetry in the Biological Sci- 
ences. Copyright John Wiley & Sons Limited. 


DSC is really only appropriate to large molecules because 
these give measurable AC, values and this is an important 
point of difference with ITC. The technique complements 
independent spectroscopic methods such as circular dichro- 
ism (Figure 8.9). 
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Figure 8.9. DSC of ribonuclease. In arriving at a final DSC profile, 
the baseline (B) must be subtracted from the thermal denaturation 
profile of the protein (in this case RNAase from Bacillus amy- 
loliquifaciens). Circular dichroism (C) confirms that the melting 
temperature (Tm) occurs at the peak of the DSC signal. Fersht, 
A. (1999). Structure and Mechanism in Protein Science, A Guide 
to Enzyme Catalysis and Protein Folding. W.H. Freeman & Co, 
reproduced with permission. 
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8.3.2 Applications of Differential 
Scanning Calorimetry 


DSC has been used in the study of a wide range of bio- 
chemical phenomena. In general, any process involving 
a temperature-induced change of phase in a biomacro- 
molecule is amenable to study by this technique. These 
would include double-to single-strand transitions in DNA, 
unfolding of globular proteins and processes involving 
multi-domain/multi-subunit proteins. It is possible to vary 
the experimental conditions used such as pH, ion strength or 
by including stabilizing molecules. Moreover, comparison 
of data for mutant and wild-type proteins allows estimation 
of contributions from single residues to the thermodynam- 
ics of specific processes. DSC can also be used to resolve 
effects involving different protein domains and hence as a 
tool to detect interdomain cooperativity. The technique is 
not limited to proteins and has been widely used in studies 
of polysaccharides and even nonbiological polymers. 


8.4 DETERMINATION OF 
THERMODYNAMIC PARAMETERS 
BY NON-CALORIMETRIC MEANS 


8.4.1 Equilibrium Constants 


For any chemical process the equilibrium constant, Ka, is 
simply the ratio of concentration of reactants to that of prod- 
ucts. Provided a means is available to quantify reactant and 
product concentrations separately, this parameter can be rou- 
tinely determined. For example, in Chapter 6 we have seen 
that spectroscopic methods such as fluorescence can be used 
to measure the fraction of a protein presented in its unfolded 
form under a variety of chemical conditions. These values 
are related to the enthalpy change of the reaction by the 
van’t Hoff equation: 


dinK, AH 


dT RT? oe 


This provides a noncalorimetric method for determination 
of AH which is independent of calorimetry. It can be used to 
confirm assumptions underlying analysis of the data such as, 
for example, showing that unfolding of a particular protein 
is a 2-state system consisting of various amounts of native 
and unfolded protein (Example 8.3). 

Other means for determining equilibrium and affinity con- 
stants include equilibrium dialysis (Chapter 7) and plotting 
fluorescence quenching data (Chapter 3). Comparison of 
such data with calorimetric measurements allows indepen- 
dent confirmation of details of binding as well as checking 
assumptions in modeling these processes. 
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Example 8.3 Differential Scanning Calorimetry in the Study of Protein Unfolding 


Fibroblast growth factors are cytokines which stimulate mitogenic and chemotactic responses in a wide range of cells. 
These effects arise from binding to receptors located on cells such as myoblasts, endothelial cells, chondrocytes and 
fibroblasts. Acidic fibroblast growth factor (aFGF) is the only one known to bind to all four characterized receptors. The 
binding of the protein to heparin renders it more stable to extremes of heat and pH and to degradation by proteolysis or 
oxidation. In the absence of heparin, the Tn of the protein is near-physiological while the presence of heparin increases the 
Tn by some 20 °C. The protein contains three cysteines and formation of disulfide bridges between these is responsible for 
inactivation in the absence of heparin. Treatment with reducing agents such as dithiothreitol regenerates active protein. 
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Source: DSC of aFGF under control, 10 mM phosphate and 10 mM sulphate conditions. Corresponding A Hea and A Hyg 
are shown. Reproduced with permission of John Wiley & Sons, Ltd from Adamek et al. (1998). Denaturation Studies 
of haFGF. In Ladbury, J.E. and Chowdhry, B.Z. (eds.) Biocalorimetry; Applications of Calorimetry in the Biological 
Sciences. 

In order to study this unfolding process further, the protein was exposed to low concentrations of the chaotropic agent, 
guanidinium hydrochloride and calorimetric data were collected by DSC. It was observed (see below) that addition of 
phosphate and sulfate resulted in a significant increase in the Tm. This is consistent with the structure of this protein 
determined by X-ray crystallography which shows a concentration of basic groups in a particular location containing a 
sulfate ion. Note that the AH determined by DSC (A Hea) is markedly lower than that determined from the van’t Hoff 
relationship (A Hyp; Section 8.4). This is consistent with the presence of multiple protein states or enthalpy contributions 
associated with aggregation or precipitation. 


REFERENCE calorimetry and differential scanning calorimetry as complemen- 
tary tools to investigate the energetics of biomolecular recogni- 
General Reading tion. Journal of Molecular Recognition, 12, 3-18. 


Ladbury, J.E. and Chowdhry, B.Z. (eds) (1998) Biocalorimetry; 
Applications of Calorimetry in the Biological Sciences, John 
Wiley & Sons, Ltd, Chichester. A comprehensive overview of 
the basic theory of both ITC and DSC with specific examples. 

Plum, G.E. and Breslauer, K.J. (1995) Calorimetry of proteins and 


Connelly, P.R. (1994) Acquisition and use of calorimetric data for 
prediction of the thermodynamics of ligand-binding and fold- 
ing reactions of proteins. Current Opinion in Biotechnology, 5, 
381-8. Review of use of calorimetric measurements in study of 


processes such as protein folding and ligand binding. 

Janin, J. (1997) Angstroms and calories. Structure, 5, 473-9. A 
thought-provoking exploration of the relationship between calori- 
metric measurements and structure in biomacromolecules. 

Jelesarov, I. and Bosshard, H.R. (1999) Isothermal titration 


nucleic acids. Current Opinion in Structural Biology, 5, 682-90. 
Reviews the contribution of calorimetry to elucidation of struc- 
tural stability and ligand interactions in proteins and DNA. 
Wunderlich, B. (1996) Thermal analysis of macromolecules — a 
personal review. Journal of Thermal Analysis, 46, 643-79. A 


historical review of the development of calorimetry in the study 
of polymers. 


Thermodynamics 


Holdgate, G.A. and Ward, W.H.J. (2005) Measurements of binding 
thermodynamics in drug discovery. Drug Discovery Today, 10, 
1543-50. A review of the role of thermodynamics in rational 
drug design. 

Orstan, A. (1990) Thermodynamics and life. Trends in Biochemical 
Sciences, 15, 137-8. 


Isothermal titration calorimetry 


Buurma, N.J. and Haq, I. (2007) Advances in the analysis of isother- 
mal titration calorimetry data for ligand-DNA interactions. Meth- 
ods, 42, 162-72. An article considering improvements in ITC 
data analysis. 

Feig, A.L. (2007) Applications of isothermal titration calorimetry 
in RNA biochemistry and biophysics. Biopolymers, 87, 293-301. 
A review of ITC as applied to RNA. 

Freyer, M.W. and Lewis, E.A. (2008) Isothermal titration calorime- 
try: Experimental design, data analysis and probing macro- 
molecule/ligand binding and kinetic interactions, in Methods 
in Cell Biology, Vol. 84 (eds J.J. Correia, and H.W. Detrich), 
Elsevier, pp. 79-113. A review of ITC. 

Oda, M., Furukawa, W., Sarai, A. and Nakamura, H. (1999) Con- 
struction of an artificial tandem protein of the c-Myb DNA- 
binding domain and analysis of its DNA binding specificity. Bio- 
chemical Biophysical Research Communication, 262, 94-7. An 
excellent example of use of ITC measurements in the study of 
protein-DNA binding. 

Chung, E., Henriques, D., Renzoni, D. et al. (1998) Mass spec- 
trometric and thermodynamic studies reveal the role of water 
molecules in complexes formed between SH2 domains and tyro- 
syl phosphopeptides. Structure, 6, 1141-51. An example of the 
combination of ITC with MS approaches to understanding the 
importance of water molecules in interactions between tyrosyl 
phosphopeptides and the SH? domain of tyrosine kinase. 


BIOCALORIMETRY 303 


Livingstone, J.R. (1996) Antibody characterization by isothermal 
titration calorimetry. Nature, 384, 491-2. Application of ITC to 
characterization of antibodies and related products. 

Turnbull, W.B. and Daranas, A.H. (2003) On the value of c: Can 
low affinity systems be studied by isothermal titration calorime- 
try? Journal of the American Chemical Society, 125, 14859-66. 
Study of the potential of ITC for study of low-affinity binding 
systems. 


Differential scanning calorimetry 


Bruylants, G., Wouters, J. and Michaux, C. (2005) Differential 
scanning calorimetry in life science: Thermodynamics, stability, 
molecular recognition and application in drug design. Current 
Medicinal Chemistry, 12, 2011-20. 

Fisher, H.F. and Singh, N. (1995) Calorimetric methods for in- 
terpreting protein-ligand interactions. Energetics of Biological 
Macromolecules, 259, 194-221. A review of study of ligand 
binding to proteins with DSC. 

Holbrook, J.A., Capp, M.W., Saecker, R.M. and Record, M.T. 
(1999) Enthalpy and heat capacity changes for formation of 
an oligomeric DNA Duplex: Interpretation in terms of cou- 
pled processes of formation and association of single-stranded 
helices. Biochemistry, 38, 8409-22. An excellent example of 
the combination of ITC and DSC with other approaches to re- 
solve DNA duplex formation into intrastrand, and interstrand 
effects. 

Robinson, C.R., Liu, Y.F., O’Brien, R. et al. (1998) A differential 
scanning calorimetric study of the thermal unfolding of apo- 
and holo-cytochrome b (562). Protein Science, 7, 961-5. An 
interesting application of DSC. 

Spink, C.H. (2008) Differential scanning calorimetry, in Methods 
in Cell Biology, Vol. 84 (eds J.J. Correia and H.W. Detrich), 
Elsevier, pp. 115-41. A review of DSC. 


A useful web site 


University of Washington calorimetry centre: http://mozart.bmsc. 
washington.edu/~hyre/calorimetry.html. 


Chapter 9 


Bioinformatics 


Objectives 


In Chapter 1 we introduced the worldwide web (www) 
as a resource for large-scale storage, sharing and analy- 
sis of information (Sections 1.5.1 and 1.5.2). In biochem- 
istry, a major category of stored information comprises se- 
quences of nucleotides (DNA) or amino acids (proteins). 
However, other types of useful information relevant to bio- 
chemistry are also available on the web. For example, ded- 
icated databases exist for two-dimensional SDS PAGE and 
MS, which are used in proteomics (Chapter 10). Specialist 
databases also exist for peptide and carbohydrate chem- 
istry and for various types of spectroscopy (Tables 9.1 
and 9.2). Similar organizing principles apply to the main 
databases used in molecular life sciences. Historically, indi- 
vidual databases arose independently of each other (which 
originally posed problems in navigating from one to the 
other), but they are now very closely interlinked. For ex- 
ample, it is possible to search the macromolecular struc- 
tural database (MSD; also called the protein databank or 
PDB, Section 9.4.2) for a particular amino acid sequence 
and to ‘translate’ a nucleotide sequence into an amino acid 
sequence. In a real sense these databases now comprise a 
closely interwoven ‘web’ of connections within the wider 
www. The very range and complexity of these interconnec- 
tions can be bewildering so, in this chapter, we will review 
the principal databases relevant to molecular life sciences 
and explain how we can use them to help us solve prob- 
lems involving physical techniques in biochemistry. Bioin- 
formatics is a dynamic and growing field and it is impos- 
sible to give it comprehensive coverage in a short chapter. 
Instead, it is hoped to describe selected resources (e.g. those 
listed in Tables 9.1-9.3), which the reader may find espe- 
cially useful in accomplishing practical tasks such as access- 
ing or analysing sequences or three-dimensional structures. 


After completing this chapter you should be familiar with: 


¢ The main sequence and three-dimensional structure archives available via the www. 

e How to access and download information from these archives. 

e How to analyse this information to extract functionally relevant data. 

e For primary structures, you will be able to align sequences, predict secondary structure, hydropathy 
and potential post-translational modification sites. For tertiary structures you will be able to view 
structures in a variety of ways and to carry out homology modelling. 


The Bibliography points the reader to some excellent texts, 
which give a more detailed picture. Because of the highly 
interconnected nature of resources on the www, there is fre- 
quently more than one way to achieve a particular objective 
(e.g. retrieving a sequence or viewing a structure) so the 
strategies outlined here are to some extent matters of per- 
sonal preference and should be regarded by the reader as 
intended merely to illustrate general possible approaches. 
All the resources described have extensive literature avail- 
able on-line to help the novice understand and exploit tools 
and archives to which they are linked so the reader is en- 
couraged to enter the web sites mentioned and to explore 
them. To illustrate just how informative and rewarding this 
can be, the aquaporin family of proteins will be used as an 
example throughout this chapter. These are a widespread 
group of transmembrane proteins which form channels for 
the transport of water across membranes of a wide range of 
cell types. Roderick MacKinnon and Peter Agre were hon- 
oured with the Nobel Prize in Chemistry (2003) for their 
work on elucidating structure-function relationships in this 
important protein family. 


9.1 OVERVIEW OF BIOINFORMATICS 


Until the late 1980s much of the information used in Bio- 
chemistry was stored in paper form either in books or re- 
search papers in the peer-reviewed literature. For example, 
an atlas of protein sequences was published annually to 
contain all the then-known amino acid sequences (Atlas 
of Protein Sequence and Structure; Margaret Dayhoff ed.). 
This meant that information was hard to find, comparisons 
were difficult and there was always a significant time-lag 
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Table 9.1. Selected bioinformatics web sites. 


Site 


Contents 


Url 


DNA database of Japan (DDBJ) 
National Center for Biotechnology 
Information (NCBI) 


European Bioinformatics Institute (EBI) 


Swiss Institute of Bioinformatics 
(ExPASy) 


Protein Information Resource 
(PIR)-international UniProt 


Munich Information Centre for Protein 


Sequences. 


Research Collaboratory for Structural 


Bioinformatics (RCSB) 


Cambridge Crystallographic Data Centre 


Table 9.2. Selected genome web sites. 


Site 


Nucleotide sequences 


Nucleotide sequences (GenBank), 
Entrez search and retrieval system, 
sequence/genome analysis tools, EST 


databases. 


and sequence analysis tools, 


Proteomics tools, Protein modelling 


(Swiss PDB viewer). 


Protein sequences, protein structure 


analysis tools. 
United protein encompassing 


SwissProt, PIR and TrEMBL. 


Genome resources. 


Files of atomic coordinates for 


macromolecules, Protein graphics 


programs 


Contents 


Nucleotide sequences (EMBL), protein 
sequences (TrEMBL and UniProt), 
sequence/structure analysis tools. 

Protein sequences (SwissProt and 
UniProt), Protein families (ProSite) 


Atomic coordinates of small molecules 
and tools for analysis/comparison 


http://www.ddbj.nig.ac.jp/ 
http://www.ncbi.nih. gov/ 


http://www.ebi.ac.uk/ 


http://us.expasy.org/ 


http://pir.georgetown.edu 
http://www.pir.uniprot.org/ 
http://mips.gsf.de/services/genomes 


http://www.rcsb.org 


http://www.ccde.cam.ac.uk/ 


Url 


The Sanger Centre 


National Center for Biotechnology 
Information (NCBI) 

European Bioinformatics Institute 
(EBD) 

Protein Information Resource 
(PIR) 

University of Bordeaux, 
Genolevures. 

Munich Institute for Protein 
Sequences. 

Fungal Genome Resource 

The Genome Web 


Institute of Genome Research 
(TIGR). 


Access to a wide range of microbial and 
eukaryotic genomes including the Human 
genome and genome searching tools. 

Access to completed and partial genomes and 


tools for searching and analysis. 


Access to completed and partial genomes and 


tools for searching and analysis. 
Access to completed genomes. 


Partial and complete genomes of 
hemiascomycetous yeasts. 


Microbial, yeast, plant and mitochondrial 


genomes. 
Fungal genomes 


Comprehensive listing of genome links. 


Genome databases and analysis tools 


http://www.sanger.ac.uk 


http://www.ncbi.nih.gov/Genomes/ 
index.html 

http://www.ebi.ac.uk/genomes/ 
index.html 

http://pir.georgetown.edu/pirwww/ 
search/genome.html 

http://cbi.labri.u-bordeaux.fr/ 
Genolevures/ 

http://mips.gsf.de/ 


http://gene.genetics.uga.edu/ 

http://www.cbi.pku.edu.cn/mirror/ 
Genome Web/genome-db.html 

http://www.tigr.org/htdig/ 


Table 9.3. Selected bioinformatics tools. 
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Site Tool Url 
NCBI BLAST http://www.ncbi.nlm.nih.gov/BLAST/ 
PIR FastA http://pir.georgetown.edu/pirwww/search/fasta.html 
PIRSF http://pir.georgetown.edu/pirsf/ 
iProClass http://pir. georgetown.edu/iproclass/ 
EBI Clustal W http://www.ebi.ac.uk/clustalw/index.html 
PANDIT http://www.ebi.ac.uk/goldman-srv/pandit/ 
SRS http://srs.ebi.ac.uk 
Weizmann Inst. Israel Hydropathy plots http://www.ebioinfogen.com/cgi-bin/webframe.pl? 
http://bioinformatics.weizmann.ac.il/ 
PBIL-IBCP, Lyons, France GOR IV (Secondary http://npsa-pbil.ibcp. fr/cgi-bin/npsa_automat.pl? 
Structure) page=npsagor4. html 
GENO3D http://pbil .ibcp.fr/htm/index.php?page=pbil index.html 
Columbia University, USA Predict Protein server http://cubic.bioc.columbia.edu/pp/submit_exp.html#top 
(PROF) 
Boston University, USA PSA structure server http://bmerc-www.bu.edu/psa/ 
Sanger Institute, UK Pfam http://www.sanger.ac.uk/Software/Pfam/ 
ExPASy PROSITE http://us.expasy.org/prosite/ 


Swiss-Pdb Viewer 
SWISS-MODEL 


Berkeley University, USA SCOP Database 
University College London, UK CATH Database 
MDL Chime 
NCBI software page RasMol 
RasTop 
PyMOL 
WebMol 
UniGene 


University of California, San 
Diego, USA 
German Cancer Research Centre 


Protein Explorer 


Multi-GIF Page 


University of California, San MODELLER 
Francisco, USA 

University of Namur, Belgium ESyPred3D 

Lawrence Livermore Natl. PROCHECK 


Laboratory, USA 


between discovery and publication of knowledge. The de- 
velopment of computers made possible rapid recovery, com- 
parison and analysis of very large amounts of information in 
almost real time. Bioinformatics is the application of com- 
puter science to the solution of biological problems. Up to 
the mid-1990s this was the preserve of specialists trained 
in fields such as computer science, structural chemistry and 
advanced mathematics. However, the explosion in computer 
power (larger memories, faster processors, better and faster 
programs) in the late 1990s have made bioinformatics in- 
creasingly part of the ‘toolkit’ essential for modern molec- 
ular life scientists. Spin-offs of this include large-scale se- 
quencing projects such as the human genome project (HGP), 
the unprecedented and continuing increase in structures held 
in the PDB and an activity known as database mining where 
scientists use databases rather than conventional screening 


http://us.expasy.org/spdbv/ 
http://swissmodel.expasy.org/ 
http://scop. berkeley.edu/data/scop.b.html 
http://www.biochem.ucl.ac.uk/bsm/cath/ 
http://www.mdli.com/products/framework/chemscape/ 
http://www. bernstein-plus-sons.com/software/rasmol/ 
http://www. geneinfinity.org/rastop/ 
http://pymol.sourceforge.net/ 
http://www.cmpharm.ucsf.edu/“walther/webmol.html 
http://www.ncbi.nih.gov/entrez/query.fcgi?db=unigene 
http://molvis.sdthat is to say, 
namelyedu/protexp1/frntdoor.htm 
http://www.dkfz-heidelberg.de/spec/pdb2mgif/ 
http://salilab.org/license-server/license.cgi 


http://www.fundp.ac.be/urbm/bioinfo/esypred/ 
ftp://www-structure.In].gov/Procheck_NT/ 


programs to discover potential new drugs or new families of 
proteins. Bioinformatics also supports the new field of sys- 
tems biology, which attempts to link genomics, proteomics 
and other forms of organized biological information to al- 
low greater understanding of functioning at cell-, tissue- 
or organ-level arising from complex interactions between 
multiple elements. 

It can be helpful to divide up bioinformatics resources 
into three distinct components: web sites, databases and 
programs (Figure 9.1). Web sites represent a single ‘loca- 
tion’ on the www curated by a scientifically reputable entity 
such as a university, a national or supranational governmen- 
tal organization, a professional body or a combination of 
several of these. Examples include the European Bioinfor- 
matics Institute (EBI; an outstation of the European Molec- 
ular Biology Laboratory located near Cambridge, England), 
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Figure 9.1. Overview of bioinformatics resources on the www. Three types of resources are shown: web sites, programs and databases 
together with possible routes taken by three separate users (bold arrows). Notice that the same databases can be accessed via different web 
sites and that user 1 can access the same nucleotide sequence databases through different web sites. Similarly users 2 and 3 can arrive at the 
same database by different routes. Where programs are unique to a web site this is indicated. 


the National Center for Biotechnology Information (NCBI; 
run by the National Library of Medicine, National Institutes 
of Health, Bethesda, Maryland, USA) and the ExPASy site 
(run by the Swiss Institute of Bioinformatics). These sites 
act as links to the principal databases and software resources 
for submission, retrieval and analysis of database entries. In- 
formation relevant to each site may be submitted (often as a 
precondition for publication in the peer-reviewed literature) 
by a means standard for each site. The data are converted 
into a uniform format and stored in a unique file. Collec- 
tions of these files are called archives which are searchable 
by a file identifier and by a variety of other means. This 
searchable nature of databases makes them especially use- 
ful in molecular life sciences. They can commonly be very 
rapidly searched by word, author (providing an important 
link to the peer-reviewed literature) or by similarity to other 
entries. Some examples of such searches are given below. 
A list of files is generated which is usually arranged in or- 
der of similarity to the search-term(s). Where several similar 
databases are searched with the same search term(s) we com- 
monly get slightly differing numbers of ‘hits’ so it is always 
worthwhile trying several databases and several searches at 


the beginning of any bioinformatics project. Most web sites 
also contain access to programs which allow us to make 
comparisons or carry out further analysis of our target files. 
Sometimes these programs can be downloaded to our own 
PC or we can access them in the web site on-line. Some pro- 
grams are found at more than one web site and these would 
usually represent the most widely-accepted approach to the 
particular analysis. Thus the BLAST tool (Section 9.3.1) is 
the standard program for carrying out rapid alignment of 
sequences while Clustal W (Section 9.3.1) is currently the 
standard program for analysing phylogenic relationships be- 
tween sequences. 

At a research level bioinformatics concerns itself with 
developing better, faster and more informative programs 
but many of these are not generally accessible or publicly 
available. In this chapter we will only concern ourselves 
with publicly available programs, which are free of charge 
and which work in a Windows operating system. For more 
powerful programs access to a dedicated workstation may 
be needed. These often use the UNIX operating system. It 
is possible to install UNIX on a PC and even to partition a 
PC hard-drive between Windows and UNIX. Some of the 


programs described here can run on Macintosh PCs just as 
well as Windows and the web sites usually provide clear 
guidance on this. 


9.2 SEQUENCE DATABASES 


Some of the most useful information in biochemistry comes 
from the sequences or primary structures of biomacro- 
molecules. Since DNA and proteins are informational 
molecules, their ultimate shape and hence biological prop- 
erties arise largely from their primary structures. Originally, 
individual databases were developed for proteins and nu- 
cleic acids in different parts of the world. Nowadays there 
are close links between all sequence databases, which make 
it possible to navigate between them. Figure 9.1 also il- 
lustrates that it is possible to search a database from more 
than one web site further emphasizing their high level of 
interconnectedness. Some of the principal bioinformatics 
web sites are listed in Table 9.1 and the sequence databases 
they curate are discussed individually below. The web sites 
also host a wide variety of powerful programs (often re- 
ferred to as tools), which allow scientists both to submit, 
retrieve and analyse sequence files on-line. Although web 
sites may have common tools and generally similar organi- 
zation there can be differences in emphasis. The NCBI site 
is mainly concerned with nucleotide and genome sequences 
while ExPASy is more concerned with protein sequences 
and their analysis. The EBI site has extensive resources for 
both nucleotide and protein sequence analysis. The Entrez 
search and retrieval system at NCBI is a particularly power- 
ful tool for finding protein sequence information, notwith- 
standing the emphasis NCBI otherwise places on nucleotide 
sequence. 


9.2.1 Nucleotide Sequence Databases 


The principal nucleotide sequence databases are GenBank 
(NCBI), EMBL (EBI) and the DNA database of Japan 
(DDBJ). These currently contain in excess of 60 000000 in- 
dividual DNA sequences which have been sequenced in both 
strands with overlapping sequence runs (Section 5.4). These 
data come from large-scale genome sequencing projects, 
patent applications and sequences submitted by individual 
researchers. All three databases exchange data with each 
other on a daily basis to give a comprehensive up-to-date col- 
lection of nucleotide sequences accessible at EMBL (Figure 
9.2). It should be borne in mind that some of the sequences 
in the nucleotide sequence databases may not have been 
published in the peer-reviewed literature. Each entry in the 
database has a unique identifier called an accession number. 
When the database accepts a sequence as meeting its quality 
requirements, this number, which is unique to the entry, is 
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issued. These are usually quoted in any peer-reviewed sci- 
entific publication and, whether published or unpublished, 
users can access these files directly on-line. 


9.2.2 Protein Sequence Databases 


One of the oldest and best resources for protein sequences is 
the Protein Information Resource (PIR) located at George- 
town University, Washington DC. The other principal pro- 
tein sequence databases are SwissProt and TrEMBL, a 
database comprising sequences translated from the EMBL 
nucleotide database. PIR now operates an international 
collaboration with EBI and the Swiss Institute of Bioin- 
formatics and other sequence databases offering a very 
high level of annotation of protein sequence files (Fig- 
ure 9.3). PIR hosts a single database updated every three 
weeks called UniProtKB (Table 9.1; Figure 9.4) which 
has two sections; UniProtKB/SwissProt (~360 000 entries) 
and UniProtKB/TrEMBL (~5 500 000 entries). UniProtKB/ 
SwissProt uses human annotation with minimum redun- 
dancy whilst UniProtKB/TrEMBL uses computer-based an- 
notation with a high level of redundancy, partly explain- 
ing the discrepancy in the number of entries. Some other 
databases which deal with how primary structure informa- 
tion can be further categorized/analysed in terms of sec- 
ondary structure, post-translational modification and spe- 
cific biological properties are discussed below and listed in 
Table 9.3. 


9.2.3 Genome Databases 


Availability of greater computer power coupled with de- 
velopments in robotics and availability of automated meth- 
ods for sequencing large amounts of DNA allowed comple- 
tion of large-scale genome sequencing projects over the last 
15 years. The most famous of these is the Human Genome 
Project, which sequenced all the genes encoded in the 
~3200 million base pairs and approximately 2 m per cell of 
human DNA. This project was completed ahead of schedule 
by an eventual collaboration between a public domain aca- 
demic human genome project consortium and a commercial 
company Celera (which means ‘swift’) (Figure 9.5). The 
complete genomes of many other representative organisms 
are now available. The smallest genomes, those of viruses, 
can be <200000 base pairs in length while the genome 
of Escherichia coli consists of 4.6 million base pairs. We 
now know the sequences of hundreds of other prokaryotes 
and archaebacteria. Many of these are pathogenic organisms 
(e.g. causative agents of bubonic plague, diphtheria, pneu- 
monia, etc.) or species of agronomic or biotechnological 
importance (e.g. lactobacilli). In bacteria, the bulk of DNA 
encodes genes while most eukaryotic DNA is noncoding. 
Selected URLs devoted to genomes are listed in Table 9.2. 
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Figure 9.2. The EMBL database of nucleotide sequences. (a) The database can be searched in a number of different ways but the sequence 
retrieval server (SRS; http://srs.ebi.ac.uk) allows us to search by text. Typing ‘aquaporin’ into the query box selects all files held for 
aquaporins. (b) A listing of hits is provided. (c) and (d) We can select just one of these to view in detail. 


Computer programs are used to identify open reading 
frames (ORFs) which are candidate genes beginning with 
a start codon (ATG) and ending with a stop codon. Bac- 
terial genes are easier to identify as they have continuous 
sequences unlike most eukaryote genes which are inter- 
rupted by introns. Eukaryote ORFs are defined as potential 
genes based on similarity with previously-identified genes 
from other organisms (homology) or based on identification 
of characteristic sequence features. A TATA box is usually 
found approximately 30 base pairs upstream of the start 
exon ending immediately before a GT splice signal. Internal 
exons begin immediately after an AG and end immediately 
before a GT splice signal. The 3’ exon starts immediately af- 
ter an AG splice signal and ends with a termination codon. 


Species-specific codon usage preference provides a yard- 
stick for distinguishing coding from noncoding regions of 
genomes. 

Two general approaches may be taken to assembling the 
sequence of a complete genome (Figure 9.5). One approach, 
taken by the public domain human genome project consor- 
tium, was to generate a ‘map’ of highlights of the genome 
and to fill in the relationship of other parts of the genome 
to these highlights by matching overlapping pieces of DNA 
sequence in a hierarchical manner. Maps of the genome 
of eukaryotes begin with the physically discrete individual 
chromosomes. For some genes genetic maps could be gen- 
erated by quantifying frequency of coinheritance or link- 
age of characteristics encoded by pairs of genes. But, for 
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Figure 9.3. The Protein Information Resource (PIR). (a) The PIR homepage, UniProt can be accessed here by selecting “UniProtKB” (i.e. 
UniProt KnowledgeBase). Note that other resources such as PIRSF and iProClass are accessible here also. (b) This is searchable by text 
using the pulldown menu; an aquaporin search is shown. (c) A list of hits is generated. 
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(a) * Reviewed, UniProtKB/Swiss-Prot P41181 (AQP2_HUMAN) E Cette aa 
Last modified November 25, 2008, Version 85. EJ History... ic 
Si Clusters with 100%, 90%, 50% identity |”) Documents (5) | L) Third-party data |} Customize display [XML] [GFF] 
Names and origin Protein attributes - General annotation (Comments) Ontologies Sequence annotation (Features) Sequences References Web resources : Cross-references - Entry information - Relevant 
documents 
Names and origin Hide | Top 
Protein names Recommended name: 
Aquaporin-2 
Short name=AQP-2 
Alternative name(s): 
Aquaporin-CD 
Short name=AQP-CD 
Water channel protein for renal collecting duct 
ADH water channet 
Collecting duct water channel protein 
WCH-CD 
Gene names Name: AQP2 
Organism Homo sapiens (Human) 
Taxonomic identifier 9606 [NCBI 
Taxonomic lineage Eukaryota > Metazoa > Chordata > Crantata » Vertebrata » Euteleostomi + Mammalia > Euthena » Euarchontogiires » Primates » Haplorrhini » 


Catarrhini > Hominidae » Homo 


Protein attributes Hide | Top 


Sequence length 271 AA. 

Sequence status Complete. 

Sequence processing The displayed sequence is not processed. 
Protein existence Evidence at protein level. 


(UIE General annotation (Comments) Hide | Top 


Function Forms a water-specific channel that provides the plasma membranes of renal collecting duct with high permeability to water, thereby 
permitting water to move in the direction of an osmotic gradient. 
Subcellular location Apical cell membrane, Multi-pass membrane protein. Cytoplasmic vesicle membrane, Multi-pass membrane protein. Nole= Shuttles from 
vesicles to the apical membrane 
Tissue specificity Expressed in renal collecting tubules. 
Domain Aquaporins contain two tandem repeats each containing three membrane-spanning domains and a pore-forming loop with the signature 
motif Asn-Pro-Ala (NPA). 
Post-transiational modification Ser-256 phosphorylation is necessary and Sufficient for expression at the apical membrane. Endocytosis is not phosphorylation-dependent 
Ivovement if disease Defects in AQP2 are a cause of autosomal nephrogenic diabetes insipidus, type | (AD-ND}) (MIM: 125800]. It is characterized by excessive 
water drinking (polydypsia) and urine excretion (polyuria) 
Sequence similarities Belongs to the MiP/aquaporin (TC 1.4.8) family. [View classification] 
(c) Ontologies Hide | Top 
Keywords 
Biological process Transport 
Cellular component Cell membrane 
Cytoplasmic vesicle 
Membrane 
Coding sequence diversity Polymorphism 
Disease Diabetes insipidus 
Disease mutahon 
Domain Repeat 
Transmembrane 
PIM hy Glycoprotein 
Phosphoprotein 
Gene Ontology (GO) 
Biological process excretion 
Traceable author ctaterert Source Protinc 
water transport ==" 
Traceable author statemert. Source: Pretinc 
Cellular component apical plasma membrane 
inferred from sequence or structural simianty Source Uni>mtkKD 
cytoplasmic vesicle membrane 


Inforred from stectrorec annotation Source- UniPratkB-SubCel 


integral to membrane 


Inferred fiom electronic annotation. Source: InterPro 


Molecular function water channel activity 


Traceable author statereet Seure Protine 


Figure 9.4. The UniProt database. (a) The entry for human aquaporin 2 (P41181). (b) General annotation section of the file describing 
attributes such as likely functions and subcellular location. (c) Ontologies section of the file giving keywords and gene ontologies. (d) 
Sequence annotation section of the file describing domain and residue functions/modification. (e) A listing of naturally occurring and 
artificial mutagenesis with consequences to function. 
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(a) 


Feature key Position Length Description Graphical view Feature 
(s) identifier 

Molecule processing 

O Chain 1-271 271 Aquaporin.2 PRO_0000063934 

Regions 

C Topological domain 1-16 16 Cytoplasmic! P=" 

Transmembrane 17-34 416 | [menia] = 
Topological domain 35-40 6 Extracellular [=== | — 

O Transmembrane 41-59 19 | [Betentet] — 

O Topologicaldomain 680-85 26 Cytoplasmic ===] 

C Transmembrane 86-107 22 | omie] — 

E Topological domain 108—127 20 Extracellular [===] —“H 

O Transmembrane 128—148 21 [Eeema] 
Topological domain 149-156 8 Cytoplasmic | P's 

O Transmembrane 157 -176 20 | Eia] — 

O Topological domain 177—202 26 Extracellular | "===! — 

Transmembrane 203—224 22 | 2a] 

O Topological domain 225-271 47 Cytoplasmic E=== | 
Motif 68-70 3 NPA1 
Mobf 184 — 186 3 NPA2 

Amino acid modifications 
Modified residue 256 1 Phosphoserine; by PKA 
Modified residue 261 1 Phosphoserine =: ==] 
Modified residue 1 Phosphoserine “#7 
Glycosylation 123 1 N-linked (GIcNAc...) = = - 

(e) Natural variations 

Natural variant 22 1 L— V inAR-NDI VAR 015239 
Natural variant 28 1 L— P in AR-NDI. VAR_015240 
Natural variant 47 1 A — V in AR-NDI VAR_015241 
Natural variant 57 1 Q — P in AR-NDI. dbSNP rs28931580. — ~ _VAR_016256 
Natural variant 64 1 G—RinAR-NDIL VAR_004401 
Natural variant 68 1 N-—SinAR-NDI VAR_015242 
Natural variant 71 1 V—MinAR-NDI — «(VAR 018203 
Natural varianl> 100 1 G—VinAR-NDI VAR_015257 
Natural variant 121 1 L — F: dbSNP rs11169226. VAR_037577 
Natural variant 125 1 T— Min AR-NDL VAR_015244 
Natural variant 126 1 T-—+MinAR-NDI VAR_015245 
Natural variant 147 1 A—TinAR-NDL VAR_015246 
Natural variant 168 1 V—MinAR-NDI VAR_015247 
Natural vanant 175 1 G = R in AR-NDI. VAR _015248 
Natural variant 181 1 C — W in AR-NDI. VAR_015249 
Natural variant 185 1 P—AinAR-NDL VAR_015250 
Natural variant 187 1 R — C in AR-NDI; mutant protein does not fold properly and is not functional — ~ _VAR_004402 
Natural variant 190 1 A — T in AR-NDI; mutant protein does not fold properly and is not functional —— + — | VAR (18251 


Figure 9.4. (Continued) 


large-scale high resolution genome sequencing, physical 
maps based on natural landmarks are required since not 
all genes have observable phenotypes. These landmarks are 
often repeating sequences, which form marker regions dis- 
tributed across chromosomes. A good example is provided 


by variable number tandem repeats (VNTRs; also known 
as minisatellites). These contain regions of 10-100 base 
pair sequences, which repeat a variable number of times 
in the genomes of different individuals and can be ampli- 
fied by the polymerase chain reaction. Short tandem repeat 
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ANONYMOUS MAL 
BLOOD CELLS 


RP11 BAC LIBRARY 


543,797 CLONES 


HIND II DIGESTION 


FINGERPRINT OF 
EACH BAC CLONE 


IDENTIFY FINGERPRINT 


E 


— 
STS CHROMOSOME 


CLONE CONTIGS 


SELECT 19,145 CLONES 
FOR SHOTGUN 


SEQUENCING 


IDENTIFY OVERLAPS s 
(PHRED & PHRAP) 


(SIX MAIN CENTRES) 


CREATE PHYSICAL MAP 


| 


ASSOCIATE SEQUENCED 


BACS WITH FINGERPRINTED 


BACS ON PHYSICAL MAP 


Figure 9.5. The Human Genome Project. An overview of the actual approach taken is shown. This is a combination of the ‘map-based’ and 
came from blood cells of anonymous male donors which generated a 
RP11 BAC library (although clones from other libraries were also used). Each clone was digested with the restriction enzyme Hind MI and 
used to generate a fingerprint on agarose gels. This allowed unambiguous identification of each clone and of overlaps between clones. These 
were associated with genetic and physical markers on specific human chromosomes by probe hybridization and contributed to building a 
physical map. Selected clones (some 19 145) were sequenced by shotgun sequencing and sequence data were assembled with computer 
programs PHRED and PHRAP. Sequenced BACs were then associated with fingerprint BACs to generate a sequence contig map and the 


whole genome shotgun approaches. The bulk of the DNA sequenced 


final genome sequence was assembled with the aid of GigAssembler. 


l 


FINAL GENOME 


polymorphisms (STRPs; also known as microsatellites) are 
sequences of 3-5 base pairs repeated 10-30 times which 
have been found to be more evenly distributed through the 
genome than VNTRs. A third type of marker is offered by 
sequence tagged sites (STSs), short sequences of 200-500 
base pairs, which have a unique location in the genome. The 
genome is cut into small pieces with restriction enzymes and 
each piece individually sequenced in a process called shot- 
gun sequencing. Regions of overlap allow the assembly of 
progressively longer sequences some of which will contain 
markers. These markers are used to generate the physical 
map of the genome by gradually building up a hierarchi- 
cal picture of its physical structure. By contrast, a second 
approach to assembling the genome (taken by Celera) was 
to eschew creation of maps and instead to apply shotgun 
sequencing to the whole genome by randomly shearing it 
into 2000-10 000 base pair fragments and sequencing these 
individually. Although this approach had proven remark- 
ably successful in viral and prokaryote genomes, which 
lack extensive repeat sequences, its application to eukaryote 
genomes was quite controversial initially. 

In practice, the map-based and whole genome shotgun 
approaches are not mutually exclusive and combining the 
two approaches assembled the human genome. Human DNA 
fragments (100 000-200 000 base pairs) were generated by 
restriction digestion and cloned into bacteria as a library 
of vectors called bacterial artificial chromosomes (BACs). 
These can accommodate fragments as large as 250 000 base 
pairs, so a library of 20000 BAC clones accommodated 
the entire genome. Restriction mapping of individual frag- 
ments allowed identification of overlap and thus the clones 
could be ordered. Each BAC clone was ‘mapped’ onto hu- 
man chromosomes using markers (VNTRs, TSRPs, STSs) 
as described above to generate a physical map. Each BAC 
clone was then cut into smaller 2000 base pair pieces and 
sequenced at each end. Piecing these sequences together as- 
sembled the sequence of the BAC clone by generating first 
a contiguous clone map or contig and eventually a complete 
sequence. The physical map allowed assembly of the BACs 
into the final version of the genome. Surprisingly, the hu- 
man genome was found to contain significantly less genes 
than anticipated (25 000-30 000 rather than 50 000-100 000 
as previously thought) and contains hundreds of bacterial 
genes apparently acquired by horizontal transfer during the 
vertebrate lineage. 

In addition to the human, complete and partial genomes 
are available at NCBI for the following vertebrates: baboon, 
bat, cat, chicken, chimp, cow, coyote, dog, dolphin, duck- 
billed platypus, elephant, frog, gibbon, gorilla, hedgehog, 
horse, lemur, macaque, mouse, orang utang, opossum, pig, 
rabbit, rat, sheep, shrew, squirrel and wallaby. Genomic data 
are also available for important insects including the honey- 
bee, wasps, silkworm, fruitfly Drosophila and the Anophe- 
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les mosquito responsible for carrying the causative agent of 
malaria. The complete genome of this causative agent, an 
intracellular protozoan parasite called Plasmodium, is also 
now available as are representative plant, worm and fungal 
species. In addition to this central genome resource many 
genome projects operate their own web sites independently 
of NCBI (Table 9.2). This allows combination of species- 
specific information with genome data. 


9.2.4 Expressed Sequence Tag Databases 


It is noteworthy that genomes of even the best-studied 
species contain a large percentage of genes of as-yet un- 
known function (e.g. 25% of the yeast Saccharomyces cere- 
visiae’s relatively small genome of 5885 genes). These 
are nucleotide sequences which appear to have in-frame 
start/stop codons and appropriate up- and downstream regu- 
latory sequences but which show little homology to genes of 
known function. Moreover, only a proportion of the genes 
encoded in genomes are expressed during the organism’s 
life. These genes may be no longer necessary for survival but 
have been retained in the genome during evolution. Genes 
which are transcribed to mRNA can be detected as expressed 
sequence tags (ESTs) which are short subsequences of cD- 
NAs transcribed from mRNA. Databases of such ESTs pro- 
vide a powerful resource for identifying new families of 
proteins and are hosted by NCBI. They are important be- 
cause they allow us to concentrate research efforts on ac- 
tively transcribed genes rather than diluting our interest on 
essentially inactive ones. An illustration of the usefulness of 
EST databases is given in Example 9.1. 


9.2.5 Single Nucleotide Polymorphism 
(SNP) Database 


DNA sequencing has revealed that some locations or loci in 
genes may be mutated in different individuals resulting in 
an alteration in primary structure at that position. These are 
called single nucleotide polymorphisms (SNPs; pronounced 
‘snips’). These sites are often of interest for recognizing 
predisposition to disease in people carrying the polymor- 
phism and the human genome is known to contain 1.4 mil- 
lion SNPs. NCBI hosts a database of human SNPs, dbSNP, 
which is accessible directly through Entrez. 


9.3 TOOLS FOR ANALYSIS OF 
PRIMARY STRUCTURES 


The DNA sequence of a gene or even a genome is in itself 
rather uninformative consisting as it does of a bewilderingly 
large number of As, Gs, Cs and Ts repeating seemingly at 
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Example 9.1 Database Mining Reveals Expression of the Neuronal Calcium Sensor Family in Tissues 
Peripheral to the Nervous System 


In study of mammalian biochemistry and physiology we are frequently interested in elucidating the tissue distribution of 
particular protein families. We often find that specific protein isoforms are expressed in specific tissues and this can give 
important clues to the role of these isoforms. The neuronal calcium sensor (NCS) protein family has been described as 
being largely specific to the nervous system. However, some members of this family are known to be expressed in other 
tissue types: VILIP-3 (haematopoietic system, gut, kidney, spleen, testis) and KChIPs (heart) raising the possibility that 
they may be involved in functions besides neuronal signalling. 

GenBank sequences in NCBI are partitioned into a nonredundant set of gene clusters in a resource called UniGene 
(Table 9.3). This lists likely expression frequencies for gene clusters by tissue distribution of ESTs. UniGene was probed 
for VILIP-1, hippocalcin and NCS-1 and the results analysed by subtracting sequences observed in pathological contexts 
(e.g. cancer). 


Breakdown by Tissue 


Hs. 288654 

Bladder 0 0/21715 
Blood 0 0/78292 
Bone 0 0/55730 
Bone Marrow 0 0/36541 
Brain 132 61/462100 
Cervix 0 0/41264 
Colon 5 1/179987 
Eye 89 15/168244 
Heart 0 0/58912 
Kidney 7 1/135458 
Larynx 0 0/27551 
Liver 0 0/131463 
Lung 17 5/288794 
Lymph Node 0 0/128142 
Mammary Gland 0 0/128200 
Muscle 0 0/109115 
Ovary 0 0/95612 
Pancreas 11 1/84639 
Peripheral 0 0/24996 
Placenta 0 0/237797 
Prostate T 1/133636 
Skin 0 0/165608 


Portion of UniGene data for hippocalcin. Expression profile is suggested by observed EST frequency. Note that the 
bulk of expression is in brain. 

The results (Table Ex1.1) were then viewed with the SAGE AnatomicViewer (http://cgap.nci.nih.gov/ 
SAGE/Anatomic Viewer), to derive a ‘virtual’ Northern blot (Section 5.11.4). 

In this paper, it was revealed that VILPI-1 and NCS-1 are also expressed in peripheral tissue while hippocalcin is 
restricted (98%) to nervous tissues. These in silico results were confirmed by Western blotting. Analysis of EST data by 
SAGE is no substitute for actual experimentation. However, this example shows how bioinformatic analysis can suggest 
sensible experiments as well as contributing to understanding results. 

Reprinted with permission of Elsevier from Gierke et al., (2004) Expression analysis of members of the neuronal 
calcium sensor protein family:Combining bioinformatics and Western blot analysis. Biochem. Biophys. Res. Commun., 
323, 38-43. 


(Continues)_| 
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M (Continued) 

Table 9.4. Tissue-specific expression of NCS proteins as deduced from human UniGene database. 

VILIP-1 Hippocalcin NCS-1 
UniGene cluster Hs.2288 Hs.288654 Hs.301760 
Total no. EST sequences 196 96 299 
Pathological 23 26 121 
Other (pooled, unknown) 15 3 19 
No. EST sequences counted 158 (100%) 67 (100%) 161 

(100%) 

Tissue 
Brain 117 (74%) 55 (82%) 76 (47%) 
Nerves 14 (9%) 1 (1.5%) 10 (6%) 
Eye/optic nerve 6 (4%) 9 (13.4%) 18 (11%) 
Heart 1 (0.64%) 6 (3.7%) 
Liver 2 (1.3%) 
Spleen 
Lung 3 (1.9%) 2 (1.2%) 
Kidney 4 (2.5%) 7 (4.3%) 
Stomach 1 (0.64%) 1 (0.6%) 
Colon 1 (0.64%) 4 (2.5%) 
Skin 6 (3.7%) 
Testis 2 (1.3%) 
Ovary 1 (0.6%) 
Placenta/uterus 2 (1.3%) 5 (3%) 
Glands. adrenal gland 1 (1.3%) 9 (5.6%) 
Pancreas 5 (3%) 
Prostate 4 (2.5%) 1 (1.5%) 4 (2.5%) 
Breast 6 (3.7%) 


random. In order to extract information of real relevance 
we need to be able to analyse sequences with fast and ef- 
ficient computer programs. A simple example of this is the 
problem of translating a DNA nucleotide sequence into a 
corresponding protein amino acid sequence. This is a triv- 
ial operation for a short sequence but becomes significantly 
more challenging as the DNA sequence becomes longer 
since the number of operations required quickly becomes 
prohibitively large. Fortunately, since sequences are simply 
a string of building blocks, they lend themselves readily to 
representation and analysis as a series of numbers and thus 
may be ‘read’ by computer programs specifically designed 
for the task. The following sections give a short introduction 
to some of the programs currently in use (URLs are listed in 
Table 9.3). These maximize our possibilities for interpreting 
DNA and protein sequence data. 


9.3.1 BLAST Programs 


The Basic Local Alignment Search Tool (BLAST) was de- 
veloped for aligning query sequences with sequences in 
databases to identify locally conserved or similar regions. 


The program is very fast and overcomes shortcomings of 
conventional programming methods, which were too slow 
for complete searches of large databases. The program max- 
imizes local similarity with a maximal segment pair score in 
contiguous regions of the sequence. The output of the search 
is a reliable measure of sequence identity arrived at rapidly, 
which is statistically robust. Various forms of the BLAST 
tool are available at the NCBI, EBI, PIR and ExPASy 
sites (Figures 9.6 and 9.7). These include: BLASTP (amino 
acid sequence searches in protein sequence databases), 
TBLASTN (amino acid sequence searches in translated nu- 
cleotide sequence databases), TBLASTX (Translated nu- 
cleotide sequences searches in translated nucleotide se- 
quence databases). There is also the possibility of searching 
with short query sequences, using the BLAST tool to search 
entire genomes and of using megaBLAST for searches 
with longer query sequences than BLASTN (nucleotide se- 
quence searches in nucleotide sequence databases). One se- 
cret of the speed of BLAST programs is that they do not 
attempt to match sequences along their full length but rather 
look for regions of highest similarity. WUBLAST (devel- 
oped at Washington University and accessible at EBI and 
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Sign in} [Regis 


> NCB BLAST Home 
BLAST finds r 


1 of similarity between biological sequences. more... 


EZ Designing or Testing PCR Primers? Try your search in Primer-BLAST, Go} 


BLAST Assembied Genomes 


Choose a species genome to search, or list all genomic BLAST databases 


© Human a Oryza sativa 

© Mouse © Bos taurus 

o Rat a Danio rerio 

a a Drosophila melanogaster 


Arabidopsis thaliana 


Basic BLAST 
Choose a BLAST program to run. 


Search a nucleotide databas 


nucleotide query 
contiguous megablast 


nucleotide blast | 


1 | Search protein database using a protein query 


rotein ble 
protein biasi Algonthms: blastp, psi-blast, phi-blast 


blastx | Search protein database using a translated nucleotide query 


Search translated nucleotide database using a protein query 


tblastx | Search translated nucleotide database using a translated nucleotide query 


Specialized BLAST 


Choose a type of specialized search (or database name in parentheses.) 


a Make specific primers with Primer-BLAST 

© Search trace archives 

o [find conserved domains in your sequence (cds) 

© Pind sequences with similar conserved domain architecture (cdart) 


News 


Align Sequences with BLAST 


Anew Bseq functionally has been 
added to the standard BLAST pages that 
allows you to align a query against a set of 
subject sequences. 
2008-08-04 12:56:52 
E) More BLAST news... 
a Gallus gallus 
© Pan troglodytes 
a Microbes 
a 


Apis mellifera Tip of the Day 


How to Search Custom Databases in 
Web-Biast Using Entrez Queries. 


A powerful feature of the BLAST Web 
interface is the abilty to imt BLAST 
searches to a subset of any database 
using a standard Entrez query. Skilful use 
of Entrez queries allows the equivalent of 
on-the-fly construction of databases of 
exact composition. 


E More tips... 


Figure 9.6. BLAST tool. Some of the various forms of BLAST available through NCBI. 


ExPASy) was developed to permit gapped alignments which 
speed up searching. PSI-BLAST (available at NCBI) be- 
gins with a ‘one at a time’ search similar to BLAST but 
then goes on to derive pattern information from multiple 
alignments on the original ‘hits’. The program continuously 
reprobes the database with the pattern detected in an iter- 
ative manner constantly refining the pattern obtained. This 
allows much more powerful detection of distantly-related 
sequences than is possible with BLAST. Other programs 
capable of picking up such distant relationships are FastA 
and ClustalW. In using these tools, the user should always 
be cognizant that default settings may differ depending on 
the site hosting the tool. It can be worthwhile to familiar- 
ize oneself with these settings and, if necessary, to vary 
them. 


9.3.2 FastA 


FastA is a family of programs available at EBI and PIR 
which allows rapid screening of proteomic, genomic, pro- 
tein and nucleotide sequence databases to reveal sequence 
similarity. It can be very specific when identifying long 
regions of low similarity even in highly diverged proteins 


or genes. Amino acid or nucleotide sequences are entered 
in the program in a particular text-based convention known 
as FastA format (Figure 9.8). This begins with a single line 
description commencing with > followed by an arbitrary 
title (e.g. ‘>protein 1; Note, there is no gap between ‘>’ and 
‘protein’ ). Subsequent lines contain the sequence in single 
letter convention for amino acids (U is used for seleno- 
cysteine) or nucleotides as specified by the International 
Union of Applied Chemistry. In fact, many bioinformatics 
programs require query sequences to be entered in FastA 
format. 


9.3.3 Clustal W 


Sometimes alignment searches between a query sequence 
and a database will reveal several genes or proteins with 
points of sequence similarity. We would then like to know 
if there is a relationship between the group formed by 
the statistically significant hits and the query sequence. 
One approach would be to carry out exhaustive pair-wise 
alignments with BLAST or FastA but this would be very 
time-consuming. Clustal programs (of which the latest 
is Clustal W2) allow alignment of multiple sequences in 
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(a) Ss NCBI results of BLAST 


BLASTP 2.2.10 [Oct-19-2004] 


Reference: 


Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 


programs", Nucleic Acids Res. 25:3389-3402. 
RID: 1102933551-3806-200588368848.BLASTO4 
Query= Aquaporin 1 


(269 letters) 


Database: Non-redundant SwissProt sequences 
159,656 sequences; 58,766,444 total letters 


If you have any problems or questions with the results of this search 


please refer to the BLAST FAQs 


Taxonomy reports 


Distribution of 161 Blast Hits on the Query Sequence 


[Mouse-over to show defline and scores. Click to show alignments 


Color Key for Alignnent Scores 


(b) Sequences producing significant alignments: 


gi|267412| sp| P. 
gi| 543832) sp! Qo 
gila? plP 


501|AQPA RANES Aquaporin FA-CHIP 


Q06019|MIP RANPI Lens fiber major intrinsic pr... 


72, AQP1 HUMAN Aquaporin-CHIP (Water channe... 
2013/AQP1 MOUSE Aquaporin-CHIP (Water channe... 
75|AQP1 RAT Aquaporin-CHIP (Water channe... 
gi|1351965|sp|P47S65;}AQP1 BOVIN Aquaporin-CHIP (Water chann... 
gil3 Oj sp|P56401)40P1 SHEEP Aquaporin-CHIP (Water chann... 


Score E 
(bits) Value 


47 e-134 
417 e-116 
245 7e-65 


Figure 9.7. Results of BLAST search. (a) The sequence of human aquaporin 1 is entered in the search box in FastA format. (b) A listing 
of 100 ‘hits’ is presented with an ‘E value’. This is a statistical estimate of the likelihood of a match occurring by chance. The value 


for the most likely hit in our search is 1 x 107150 


aquaporin 1. 


a single step so as to reveal points of similarity within 
related sequences and to reveal patterns of their divergence 
(Figure 9.9). This program is available at EBI and can be 
downloaded to your computer. 


9.3.4 Hydropathy Plots 


In Chapter 6 we saw that hydrophobic amino acids have a 
strong preference to be located away from solvent water so 


which suggests that we can be very confident indeed that our search sequence is 


these are often found in locations such as the interior of 
proteins or in membrane-bound protein surfaces. Con- 
versely, polar amino acids have a strong preference for 
being in contact with water and tend to be found on 
the surface of proteins. A statistical analysis of the fre- 
quency with which each of the 20 common amino acids 
is found on the surface or interior of proteins for which 
three-dimensional structures are available was carried out 
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FQUENC DATABASE STATISTICAL 
SCORES ALIGNMENTS RANGE RANGE FRTER ESTIMATES 
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Figure 9.8. Fast A. (a) This tool is accessible at the European Bioinformatics Institute site. (b) A sequence is entered in the search box or 


else users may browse to a file. 


by Kyte and Doolittle in the 1980s and this resulted in 
a computer program containing an algorithm which al- 
lowed prediction of the eventual location of stretches of 
amino acids within a protein sequence (Figure 9.10). Such 
plots, called hydropathy plots, were originally used to pre- 
dict location of likely immunogenic regions of newly- 


sequenced genes (these could then be synthesized by Mer- 
rifield peptide synthesis and antibodies could be raised 
to at least some of the putative epitopes). However, hy- 
dropathy plots are now more widely used for prediction 
of transmembrane domains of membrane-bound proteins 
(Example 9.2). 
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(b) CLUSTAL 2.0.10 multiple sequence alignment 
Human MASEFKKKLFWRAVVAEFLATTLFVFISIGSALGFKYPVGNNQTAVQDNVKVSLAFGLSI 
Rat MASE IKKKLFWRAVVAEFLAMILFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI 
khkk HHH RHE RHEE HH, hh Hahn hehhhhhe 
Human ATLAQSVGHISGAHLNPAVILGLLLSCQISIFRALMYIIAQCVGAIVATAILSGITSSLT 
Rat ATLAQSVGHISGAHLNPAVILGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL 
PHI IIR HIF IE IR II TTI IH TTI III ITI SFM HTH IIIT EEEE EEEE EEE 
Human GNSLGRNDLADGVNSGQGLGIEIIGILQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH 
Rat ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH 
(Jee eee ee SES SSS SESS SSS SSS TS SSCS STE S PSPS ESSE SSS SSE S TT TS SS SF 
Human LLAIDYTGCGINPARSFGSAVITHNFSNHWIFWVGPFIGGALAVLIYDFILAPRSSDLID 
Rat LLAIDYTGCGINPARSFGSAVLTRNFSNHWI FWVGPFIGSALAVLIYDFILAPRSSDFID 
REAR EREEERERRER RE REEREREREREREEE Keke HEKHKKAKHRHHK: RE 
Human RVKVWISGQVEEYDLDADDINSRVEMKPK 
Rat RMKVWISGQVEEYDLDADDINSRVEMKPK 
ERE RHEE KEKE RHEE EERE ARHES 
Ti 


Figure 9.9. Clustal W. This allows alignment of related sequences. (a) Two sequences (human and rat aquaporin 1) are entered in Fast A 
format and submitted to Clustal W. (b) The program reports an alignment with identical residues underlined by asterisks and similar residues 


by colons. 
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(a) 4 


(b) 


(c) Query Sequence (protein1) 
compared with 
P30868 escherichia coli. glucuronide carrier protein (glucuronide permease). (UIDB_ECOLI] 


s proteini 
~ UIDB_ECOLI 


Figure 9.10. Hydropathy searching. Net hydrophobicity varies along a polypeptide chain with hydrophobic regions likely to be found in 
the interior of proteins or on the surface of membrane-bound proteins. Hydrophobic regions give positive hydropathy while hydrophilic 
residues are negative. (a) Hydropathy plot from a single sequence. In this analysis a window of 17 residues is selected and hydropathy is 
averaged across this window. (b) Comparative hydropathy plots for two sequences. The sequences of human aquaporin | and rat aquaporin 
5 are compared (a window of 7 residues is selected in this case). (c) SwissProt may be searched for sequences with similar hydropathy 
profiles. This reveals that the glucuronide permease of Escherichia coli yields a similar hydropathy profile to human aquaporin 1. (Analyses 
performed at URL of Ebioinfogen, The Weizmann Institute, Israel; Table 9.3). 
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Example 9.2 Sequence Alignments Identify a Family of Bacterial Transmembrane Small Multidrug Resistant 
Proteins 


Bacterial resistance to antibiotics is a pressing public health problem. One mechanism for this resistance is expression 
of transmembrane proteins called small multidrug resistance (SMR) proteins which actively pump antibiotics from cells. 
This pumping action is either driven by hydrolysis of ATP or else is coupled to inward movement of protons down their 
transmembrane concentration gradient. Because this family of proteins is transmembrane, it is difficult to solve their 
structure by X-ray diffraction or NMR (Chapter 6). One of the best-studied bacterial SMRs is EmrE. This was found 
by CryoEM (Section 6.6.2) to contain four transmembrane «-helices per subunit with a dimeric quaternary structure. In 
structural modelling, assignment of hydrophobic sequences to these eight helices would result in > 160 000 permutations. 
However, if a symmetry arrangement existed between two parts of the structure, this number reduced to 48. Sequences 
of similar proteins were used to identify most conserved residues based on the rationale that residues packed against 
helices are well-conserved through evolution. By contrast, residues facing away from the protein core towards lipid can 
mutate much more readily. In this way, the helices could be identified and a plausible model generated for their relative 
orientation in space which revealed a clearly symmetric relationship as shown: 


M4 
Q 
M3' 


cr 
Cy 


M2' 


M3 


mz]: |4 |s 6 |7 BE 


Variable Conserved 


Importantly, this structure accords closely with the known biochemical properties of this protein unlike previous structural 
studies which seem to have trapped nonphysiological structural variants. 


9.3.5 Predicting Secondary Structure 


Protein secondary structure reflects common strategies for 
folding up the extended polypeptide chain based principally 
on the tendency of dihedral angles to occupy limited parts 
of the Ramachandran diagram and formation of hydrogen 
bonds between carbonyl and amide groups of separate pep- 
tide bonds (Figure 9.11, see colour plate section). The re- 
quirement to maximize repulsion of ‘R’ groups and to form 
straight, undistorted hydrogen bonds leads to formation of 
two common types of secondary structure: «-helix and B- 
pleated sheet. Other types of secondary structure exist and 


are found in proteins with specific structural requirements 
(e.g. the Poly Pro II structure of the collagen triple helix) but 
these two types dominate the bulk of secondary structure. 
Secondary structure components of proteins are linked by 
regions of random coil and by turns. Random coil is some- 
thing of amisnomer as the polypeptide chain in these regions 
usually has a fixed structure in space but the term simply 
denotes that the structure in that region does not fall into the 
other convenient categories. Prolines are disruptors of the 
main types of secondary structure but allow the polypeptide 
chain to change direction (i.e. turn) because they can adopt 
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74 44 566 889868764 ő 6775 6788888875 
eeii ates seep oe EE. a... aD 
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b bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb- b bbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb 
32463269767686576 2 2 6876895665 2 3022 22396553: 36999999986666326 359 


(b) Strand/Turn/Helix Probabilities 


Probability of being in a stand for Query sequence 1 (psp4.m), 13-Dec-2004 10:16:40 


Residue Postion 
Probability of being in a turn for Query sequence 1 (pap4.m), 13-Dec-2004 10:16 40 
Secondary-stiucture probabilities for Query sequence 1 (vap8.m) 


5% 100 150 W0 25 
Residue Postion 


Probability of being in a helix for Query sequence 1 ipapt.m), 13-Dec-2004 10:16:40 


Figure 9.12. Prediction of Secondary structure for human aquaporin 1. (a) Results from the Predict Protein server. This contains an analysis 
by PROF which predicts 63.47% o-helix (H), 2.2% (3-pleated sheet (E) and 34.3% random coil (L). The numbers (minimum 1 and maximum 
10) denote the significance of assignment. (b) Graphical representations from the PSA server of probability of formation of «-helix, turns 
and f-sheet. Notice that the structure is indeed dominated by periodically repeating œ-helices which are characteristic of transmembrane 
proteins. 


THIOREDOXIN 
SUPRAMILY 


CYTOSOLIC 
GSTS 
< 


THIOREDOXIN 


GSH 
PEROXIDASES 


THETA 


aa 


BIOINFORMATICS 325 


SHARE PROTEIN 


SUPRAFAMILIES 
FOLD 


PROTEIN 
SUPERFAMILIES 


PROTEIN 
FAMILIES 


PROTEIN 
CLASSES 


Figure 9.13. Protein families. Based on architecture, fold, domain structure and sequence homology, various levels of similarity can be 
observed amongst proteins. The thioredoxin suprafamily consist of several protein superfamilies which share the thioredoxin fold presumably 
because they all bind glutathione. Only three of these are shown. The GSTs are one superfamily which in turn consists of two families: 
cytosolic GSTs (which are dimeric) and membrane-associated proteins involved in eicosanoid and glutathione metabolism (MAPEG) which 
are membrane-bound and trimeric. The cytosolic GST family consists of at least 13 classes of which three are shown. 


a cis conformation far more often than other residues. There 
are many different types of turns in proteins and these also 
have been subjected to classification. For most purposes 
a-helix, 6-pleated sheet, turns and random coil allow an 
adequate description of secondary structure. Reference has 
been made in Chapter 3 to experimental estimation of sec- 
ondary structure by spectroscopic techniques such as FTIR 
(Section 3.6.4) and CD (Section 3.5.5). However, using a 
similar approach to that of Kyte and Doolittle for hydropa- 
thy, it is possible to estimate the frequency with which all 20 
amino acids are found in «-helix or B-pleated sheet or nei- 
ther. Some residues are found strongly to promote «-helix 
or B-pleated sheet (strong formers) while others strongly 
disrupted their formation (breakers). Still other residues are 
found to be intermediate or neutral in their effect. This forms 
the basis for prediction of secondary from primary structure 
using the Garnier Osguthorpe Robson (GOR) algorithm, 
the predict protein service and the PSA server (Table 9.3). 
These algorithms read the amino acid sequence as a series of 
numbers reflecting the average propensity to form or break 
a particular secondary structure type at that position. The 
numbers are usually averaged over a ‘window’ of residues, 
which allows us to predict the most likely secondary struc- 
ture of a query protein for which we may only have the amino 
acid sequence (Figure 9.12). When using on-line algorithms 
for prediction of hydropathy plots or secondary structure it 
should be borne in mind that these are based on the aver- 


age behaviour of amino acids in the comparatively limited 
number of proteins for which we have three-dimensional 
structures. Accordingly, these algorithms sometimes have a 
relatively low predictive accuracy. 


9.3.6 Identifying Protein Families 


Just as the secondary structure and hydropathy properties 
of proteins are encoded in the primary structure, so is the 
three-dimensional fold, which determines the ultimate bio- 
logical function of the protein. It is clear now that individual 
ancestral proteins gave rise to similar but distinct proteins 
by processes such as divergence, which resulted in pro- 
tein classes, families, superfamilies and even suprafamilies. 
The GSTs are a good example of this since these enzymes 
can be classified in such a hierarchical manner within the 
thioredoxin suprafamily (Figure 9.13). A feature of proteins 
encoded by the human genome is that structural units such 
as motifs and domains are used and reused in a particularly 
versatile way to result in creation of a higher level of biolog- 
ical functionality than in other organisms. Some of the pro- 
grams described above reveal similarities between related 
proteins but there is often such a low level of overall se- 
quence identity that special analyses are required to allocate 
proteins to distinct families. Pfam is a searchable database 
available at the Sanger Institute (and elsewhere), which au- 
tomatically allocates protein sequences to specific families 
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Number of documents in PROSITE containing the search term:10 

» POOCS1353 Arsenate reductase arsC family profile 
PO0C51324 ERV/ALR sulfhydryl oxidase domain profile 
PDOC00173 Glutaredoxin active site signature and domain profile 
PDO0C51340 MOSC domain profile 
PDOC51396 PUL domain profile 
POOC00073 Pyridine nucleatide-disulphide oxidoreductases class-| active site 
PDOC00496 Pyridine nucleotide-disulphide oxidoreductases class-|| active site 
POOC50404 Soluble glutathione S-transferase N- and C-terminal domain profiles 
PDOC00172 Thioredoxin family active site signature and domain profie 
» PDOC0NS73 Tpx family signature 


(c) °* Taxonomic range: Archaebacteria, Bacteriophages, Eukaryotes, Prokaryotes (Bacteria), Eukaryotic viruses 
« Maximum known number of repetitions of the pattern in a single protein: 3 
+ FT_KEY: DOMAIN 
+ FT_DESC; Glutaredoxin 
+ VERSION: 1 


True positive hits: 


AHPF_ECOLI (P35340), AHPF PSEAE (Q91622), AHPF_PSEPK (POA155), 
AHPF_PSEPU (POA1S6), AHPF_SALTY (P19480), AHPF_STAAS (005204), 
AHPF_STAAC (QSHIR6), AHPF_STAAM (P66012), AHPF_STAAN (P99118), 
x AHPF_STAAR (Q6GJRO), AHPF_STAAS (Q6GC92), AHPF_STAAW (P66013), 
b: AHPF_XANCH (006465), DHNA BACSU (P42974), DHNA BACYN (P26829), 
GLRX1_BOVIN (P10575), GLRX1_CAMPM (Q8V2Vi), GLRXĪ_ CAMPS (Q775X4), 
GLRX1_CHICK (P79764), GLRX1_CWPXB (QSQMYS), GLRX1_CWPXG (Q80E01), 
GLRX1_ECOLI (P68688), GLRX1_ECTVM (Q8JLFS), GLRX1_HUMAN (P35754), 
GLRX1_MOUSE (Q9QUHO), GLRX1_PIG  (P12309), GLRX1_RABIT (P12864), 
GLRX1_RABPU (Q6RZN3), GLRX1_RAT  (Q9ESH6), GLRX1_RICBR (Q1RHJO), 
GLRX1_RICCN (Q92J02), GLRX1_RICFE (Q4UKL7), GLRX1_RICPR (Q92DW1), 
GLRX1_RICTY (Q68XG4), GLRX1_SALTI (POA1P9), GLRX1_SALTY (POA1PS), 
GLRX1_SCHPO (036032), GLRX1_SHIFL (P68689), GLRX1_SYNY3 (P74593), 
GLRX1_VACCA (Q76ZV3), GLRX1_VACCC (P68690), GLRX1_VACCP (P68691), 
GLRX1_VACCT (Q77TL9), GLRX1_VACCV (P68692), GLRX1_VARV (P32983), 
GLRX1_YEAST (P25373), GLRX2_BOVIN (Q32L67), GLRX2_ HUMAN (Q9NS18), 
GLRX2_MOUSE (Q923X4), GLRX2_PONAB (Q5RC53), GLRX2 RAT  (Q6AXW1), 
GLRX2_RICCN (Q92GH5), GLRX2_RICFE (Q4UK94), GLRX2_RICPR (0053957), 
GLRX2_RICTY (Q68W05), GLRX2_SCHPO (Q9UTI2), GLRX2_SYNY3 (P73492), 


Figure 9.15. The PROSITE database. (a) PROSITE (Table 9.3) can be searched by text for ‘thioredoxin’. (b) This reveals nine families. 
(c) One of listed entries. 
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Figure 9.16. The PANDIT database. (a) Portion of the alignment option for the thioredoxin family in PANDIT. (b) Portion of the phylogenic 


tree for the same family. 


(Figure 9.14, see colour plate section). This currently (July, 
2007) contains more than 9000 families covering some 75% 
of known protein sequences (an individual protein may 
belong to more than one family). An especially useful fea- 
ture of the resource is that it contains data on possible 
structural interactions between proteins. Other protein fam- 
ily databases listed in Table 9.3 include PROSITE (avail- 
able at ExPASy; Figure 9.15), PANDIT (available at EBI; 
Figure 9.16) and iProClass and PIRSF (available at PIR; 
Figure 9.17). 


9.4 TERTIARY STRUCTURE 
DATABASES 


The biological properties of biomacromolecules arise from 
their three-dimensional structure. In Chapter 6 we saw that, 


although this structure is ‘encoded’ in the primary struc- 
ture, it is not yet possible directly to calculate tertiary from 
primary structures because we do not fully understand the 
processes by which biomacromolecules fold up; the protein 
folding problem (Chapter 6). We also saw that elucidation 
of three-dimensional structure is a major experimental un- 
dertaking involving principally X-ray diffraction through 
crystals or multi-dimensional NMR. These techniques have 
severe inbuilt limitations. It is difficult to find precise con- 
ditions for crystallizing proteins and not all proteins are 
fully soluble in unaggregated form at acid pH values. By 
contrast, high-throughput methodologies are available for 
sequencing genomes, genes and proteins. Thus, in search- 
ing sequence databases we often find a very large num- 
ber of polynucleotide sequences, somewhat fewer protein 
sequences and only a handful of three-dimensional struc- 
tures for any given protein type. More often, we find no 
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Figure 9.17. The PIRSF database. (a) Text search for superfamilies similar to thioredoxin. This search results in 50 families. (b) Portion of 
the summary report for one superfamily. (c) Summary report for one entry. 
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Figure 9.18. The Cambridge Crystallographic Data Centre. Clicking on each icon gives information about the programs used for visualizing 
and analysing small molecules in this database. Note that CCDC does not contain structures of inorganic or metals/alloys. These are held at 
the Inorganic Crystal Structure (http://www.icsdwebfiz-karlsruhe.de) and CRYSTME (http://www.tothcanada.com) databases, respectively. 


three-dimensional structure for a query protein. This is es- 
pecially true for proteins associated with membranes, a cat- 
egory representing almost a third of the 70% of genes of 
known function in the Human genome. Despite great techno- 
logical advances we still have disproportionately few struc- 
tures for membrane proteins such as aquaporins. In contrast 
to this paucity of structural data, more than 400 000 struc- 
tures are available for small molecules in the Cambridge 
database and these provide important avenues for new drug 
discovery. The following sections offer a brief introduction 
to the main three-dimensional structure databases. 


9.4.1 Cambridge Database 


Structures of small molecules are much easier to determine 
by X-ray diffraction than biomacromolecules because they 
are easier to crystallize, give stronger and simpler diffrac- 
tion patterns and are more structurally monodisperse. These 


structures (as well as structures determined by neutron 
diffraction) are archived in a database at the Small Molecule 
Cambridge Crystallographic Data Centre (CCDC, Cam- 
bridge, England). Various tools are available here for com- 
parison and analysis of these structures (Figure 9.18). These 
structures can be ‘docked’ onto protein structures and the 
resulting complex energy minimized to reveal effects of the 
ligand on protein structure. Small molecules capable of bind- 
ing proteins are of interest as potential drugs and this ap- 
plication of bioinformatics is now an important activity in 
drug discovery (Section 9.6.3 below). 


9.4.2 Protein Databank (PDB) 


Three dimensional structures of biomacromolecules are 
now principally archived by the Research Collaboratory 
for Structural Biology (RCSB) as described in Chapter 
6 (Section 6.8.1). While the overwhelming majority of 
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Figure 9.19. The Protein Data Bank (PDB).The PDB can be searched by PDB ID (a 4-letter code unique to each file), by text or by 
author. Other resources available from the front-page include data on holdings, methodologies for structure determination and downloadable 


software. 


biomacromolecular three-dimensional structures are held in 
the PDB it should be borne in mind that a small number of 
atomic coordinate files are not deposited there but are instead 
held in specialist databases. Also, there is a separate archive 
of NMR-derived structures BioMagRes at the University of 
Wisconsin. Moreover, some of the older files held by RCSB 
are bibliographic files only and do not contain atomic coor- 
dinates. Access to the PDB is possible from many sequence 
databases and in these contexts is sometimes referred to 
as the Macromolecular Structure Database (MSD). Struc- 
ture files are submitted to the PDB ahead of publication in 
the peer-reviewed literature and receive a unique PDB ID or 
identification code (this code is given in the eventual publica- 
tion), which is analogous to the accession code of a sequence 
database. The PDB ID for the file for human aquaporin 1 is 
1J4N. If these authors submitted a second structure file for 
this protein, it would be denoted 2J4N, and so on. The PDB is 
highly redundant with, for example, hundreds of structures 
for the enzyme lysozyme. It frequently contains variants 
such as site-directed mutants, proteins crystallized with dif- 
ferent ligands or apoproteins (proteins crystallized without 
ligands). In Chapter 6 we saw that molecular replacement 
is often used to solve the phase problem in protein crystal- 
lography and this has led to a tendency for new structures 


to ‘cluster’ around previously solved structures. Conversely, 
as mentioned above, there is a paucity of structures for pro- 
teins associated with membranes. It is instructive to bear in 
mind that our knowledge of protein structure is limited by 
the constraints of the experimental techniques available for 
structure elucidation. 

A number of useful resources are available from the PDB 
front page (Figure 9.19) including a breakdown of current 
holdings updated weekly, PDB statistics, background infor- 
mation on PDB history and on how PDB curates structure 
data. The PDB is also a very useful repository and link for 
software and programs for structure visualization and anal- 
ysis. Many of these can be downloaded without charge and 
advice/help on downloading are provided. 

The PDB can be searched by keyword (e.g. entering the 
protein name ‘aquaporin’), by PDB ID (e.g. ‘1J4N’ so that 
one can return to a file previously retrieved without exten- 
sive searching) or by author generating a list of files (Figure 
9.20). In this listing the experimental method used (e.g. 
X-ray diffraction) is stated together with the file title and 
structure resolution (e.g. 2.0A in Figure 9.21). Both NMR 
and X-ray diffraction PDB files are organized in the same 
way but it should be borne in mind from Chapter 6 that 
NMR generates a family of structures rather than the single 
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Figure 9.20. Searching the PDB. A list of 4 structure files is found for ‘aquaporin-1’. Note that only one of these was determined by X-ray 
diffraction (2.2 A resolution) while the others were obtained by lower resolution electron diffraction (3.7-3.8 A). This is typical of findings 


with membrane-bound proteins. 


structure obtained from crystals. This means that an NMR 
file will often contain 10-30 structures making these files 
larger than those for a corresponding single structure. We 
can select a file by clicking on its title or choose to down- 
load or open the file or to visualize the structure with Jmol 
viewer (see Section 9.5 for discussion of graphics programs) 
by clicking on appropriate icons. PDB files are all laid out 
in the same way with bibliographic annotation referring to 
the peer-reviewed literature first followed by information 
on the experiment used to generate the file of atomic coor- 
dinates (Figure 9.21). In the case of X-ray diffraction this 
will include crystallographic information such as unit cell 
architecture, space group and strategies used to generate the 
electron density map. The bulk of the file consists of a list- 
ing of individual atoms beginning with the N-terminal amide 
nitrogen followed by the x-carbon, carbonyl carbon and ‘R’- 
group atoms denominated sequentially (Figure 9.21). Then 
the atoms for residue 2 are given, and so on until the car- 
bony] carbon of the C-terminal residue of that chain. If there 
is more than one chain this is denoted as A, B, C, and so on. 
Nonprotein atoms are denoted as heteroatoms. Principally 
this consists of oxygen atoms of water molecules (HOH). 
A small number of atoms are associated with any ligand 
cocrystallized with the protein and these are listed after the 
water molecules. Each atom is associated with three num- 
bers, the atomic coordinates, which are essentially Cartesian 
coordinates denoting the location of the centre of the atom. 


Comparing coordinates for consecutive atoms will show that 
these numbers vary slightly in near-neighbour atoms but if, 
say, atom 1 is compared to atom-20 the numbers alter quite 
considerably. Protein graphics programs accept PDB files 
as inputs and interpret their list of coordinates to allow visu- 
alization of the structure in three dimensions (Section 9.5; 
Table 9.3). 


9.4.3 Specialist Structural Databases 


A key component of hierarchical protein structure classifica- 
tion involves recognition of secondary structure composition 
and patterns of arrangement within protein structure files in 
the PDB. Protein folds are quite robust to changes in pri- 
mary structure with the result that sequence alignments can 
miss similarities in proteins, which may nonetheless adopt 
similar folds and hence possibly have similar function. Spe- 
cialist structural databases allow us to classify proteins into 
three-dimensional classes based partly on their composition 
and arrangement of secondary structure elements. The best 
known of these are CATH and SCOP (Table 9.3). 

CATH classifies proteins based on four levels of structural 
hierarchy (Figure 9.22). Class is automatically assigned as 
mainly «x-helix, mainly ®-sheet, mixture of «-helix/6-sheet 
and proteins with especially low composition of «-helix/B- 
sheet. Architecture is manually assigned based on well- 
known secondary structure arrangements such as -barrels 
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SCALEZ 0.000000 0.010715 0.000000 0.00000 
SCALE3 0.000000 0.000000 0.005540 0.00000 
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ATOM 2 CA META 1 30.243 18.931 2.093 1.00164.58 c 
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ATOM 10 CA ALA à 2 31.344 15.696 3.752 1.00155.49 c 
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Figure 9.21. PDB file for aquaporin 1 (1J4N). (a) The file opens with bibliographic annotation. (b) Portion of PDB file showing part of the 
amino acid sequence of aquaporin 1 and the beginning of the list of atomic coordinates. 
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Figure 9.22. The CATH database. (a) The database can be searched by text for thioredoxin. (b) The classification is based on Class, 
Architecture, Topology and Homology. 
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and three-layer sandwich but ignoring connectivity between 
secondary structure elements. Topology is based on similar 
folds recognized by an algorithm involving both secondary 
structure features and their connectivity. Homology recog- 
nizes sequence identity suggesting a common ancestor. 

SCOP (Figure 9.23) also has a hierarchical structure with 
currently eleven classes dominated by secondary structure 
elements similar to CATH (all «-helix, all B-sheet, mainly 
parallel B-sheets a/B, mainly antiparallel B-sheets « + £, 
multi-domain proteins, membrane/cell surface proteins and 
small proteins). Four classes (coiled coil, low resolution 
protein structures, peptides and designed proteins) are oper- 
ational rather than true classes. This database is searchable 
by text and has the facility to represent the protein as a 
three-dimensional protein graphic with the Chime plug-in 
(Section 9.5.2). 


9.5 PROGRAMS FOR ANALYSIS AND 
VISUALIZATION OF TERTIARY 
STRUCTURE DATABASES 


Much of the information contained in three dimensional 
biomacromolecular structures needs to be visualized in three 
dimensions in the same way that primary structures may be 
viewed as a unidirectional string of building blocks. It is 
necessary to interpret structure files with graphics programs, 
which trace atoms through three-dimensional space. These 
programs create a sort of optical illusion in that they actu- 
ally present a two-dimensional view on the PC screen at any 
given moment in time. However, because the screen can be 
refreshed more quickly than the eye can detect, the impres- 
sion of a three-dimensional view is achieved. The viewer can 
rotate, translate, zoom in/out and choose to represent some 
or all of the atoms/residues in a range of ways. Some very 
powerful graphics programs are available but many of these 
work in specialist environments such as UNIX. The follow- 
ing is a brief description of some graphics programs, which 
run on windows or Macintosh systems. Unless otherwise 
stated, they can be downloaded from the software section of 
the PDB (or from links listed there) free of charge provided 
they are used for academic not-for-profit purposes (Tables 
9.1 and 9.3). Once installed on a PC, these programs can 
be used to view structures downloaded from sources such 
as the PDB. This sometimes allows us to work off-line with 
both program and file on a local computer. 


9.5.1 RasMol/RasTop 


RasMol was written in the 1990s by Roger Sayle and is still 
very useful in that later programs have adopted similar com- 
mand structures. The first thing to do is to select a PDB file 


with the open command of the edit pulldown. By default, 
this represents the protein as a wireframe structure (Figure 
9.24, see colour plate section). The view can be altered by 
one of three methods. 1. Shortcuts with mouse buttons allow 
zooming, translation or rotation of the image. 2. The pull- 
down menus allow variation of the structure representation. 
3. A command line allows typing in of specific commands 
(a full list is given in the manual available with the help pull- 
down). Examples of useful commands are select (which can 
select a specific residue, run of residues or ligand), set back- 
ground (allows selection of different background colours) 
and spacefill. If a portion of the structure has been selected, 
any commands subsequently selected (whether by pulldown 
or command line) are executed only on the selected piece of 
structure. 

RasTop is an updated version of RasMol released in 2000. 
The graphics are generally superior to RasMol but most of 
the commands are identical (Figure 9.25, see colour plate 
section). However, there are more extensive pulldown com- 
mands and there is no command line as such. Instead a 
separate command box is selectable from the edit pulldown 
into which RasMol commands can be entered. From both 
RasMol and RasTop, views can be exported as GIF files with 
the export command and saved for later use in documents 
or presentations. 


9.5.2 Protein Explorer 


Protein Explorer is a more sophisticated derivative of 
RasMol developed by Eric Martz, which can be accessed 
through either Internet Explorer (version 5.5SP2 or later) or 
Netscape (version 4) web browsers (Figure 9.26, see colour 
plate section). The program operates in conjunction with a 
separate plugin program called Chime, downloadable from 
a company called MDL (see URL at the end of this chapter. 
NOTE: It is necessary to register online with MDL before 
downloading Chime). The URL proteinexplorer.org brings 
us to FrontDoor which allows us to open up a PDB file by 
typing a PDB ID into the box and clicking go. A structure 
appears in one of three windows. The image can be modified 
in two ways. Firstly, by typing commands in the commands 
window we can vary the image with RasMol commands. 
Selecting explore more brings us to a window which allows 
us to customize the image using pulldown commands select, 
colour and display. 


9.5.3 PyMOL 


PyMOL was developed by Warren DeLano and is, in princi- 
ple, freely available although the authors do request contri- 
butions to defray their costs. The programs mentioned so far 
are not available in open source form, which means the user 
cannot modify the program to improve its performance. On 
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2. Glycerol uptake facilitator protein GlpF [56898] 
glycerol conducting channel, related to aquaporin 
1. Escherichia coli [56899] (4) EEI 


Figure 9.23. The SCOP database. (a) The database can be searched by text (e.g. aquaporin). (b) The result is a classification of the protein 
from suprafamily down to individual species. Selecting the graphics icon allows us to view the three-dimensional structure by Chime 
(Section 9.5.2). Clicking on the species listed in (b) reveals a listing of all PDB entries for that species. 
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the other hand commercially available open source programs 
are expensive. PyMOL is a generous attempt to provide to 
the international scientific community the computer power 
of an open source program in inexpensive form. It allows 
us to visualize multiple conformations of a single structure, 
to interface with external programs and to generate high- 
quality graphics (Figure 9.27, see colour plate section). As 
with the programs described above, the first step is to use a 
pulldown to open a PDB file. We can then select hide, show 
or label so as to display various aspects of the structure. It 
is possible to open up several PDB files in a single window 
and to select commands for each one in turn. Clicking on 
the file name will ‘hide’ the image for that protein. PPMOL 
allows saving both of individual images and of movies. 


9.5.4 WebMol 


WebMol is a viewer program written in the Java computer 
language by Dirk Walther and designed to act as a browser 
(although the program may also be downloaded to your lo- 
cal PC). Opening the URL brings you to a window in which 
the PDB ID is entered (Figure 9.28). A wireframe structure 
appears by default. Selections can be made on the right hand 
side of this window which allows us to choose which parts 
of the structure we wish to show, how we wish to colour it 
and if we wish to represent surfaces on the structure. A se- 
lect box opens a window which allows us to pick out regions 
of sequence to represent in the structure window. We can 
use the measure box to measure bond lengths, angles, and 
so on. A distance matrix box allows us to view distances 
between any two points in the sequence while a Ramachan- 
dran window allows us to view dihedral angles. A trace 
option draws the three-dimensional structure sequentially. 
A row of horizontal buttons allows us further options to 
adapt the image. 


9.5.5 Swiss-PdbViewer 


Swiss-PdbViewer allows analysis of multiple structures at 
the same time and in the same window (Figure 9.29). In 
contrast to PyMOL though, the structures of related pro- 
teins can be superimposed to reveal structural alignment. 
Mutations can be introduced into the sequence at specific 
points and the effects on structure computed by an energy 
minimization routine. Measurements can also be obtained 
of bond angles and distances. The workspace consists of 
a main window with associated pulldown commands in 
which the principal manipulations are performed. The win- 
dow pulldown allows selection of supplementary windows; 
alignment shows sequence alignments of files loaded, con- 
trol panel allows selection and manipulation of individual 
groups and a Ramachandran window shows a Ramachan- 


dran plot for the whole protein and any parts specifically 
selected (Example 9.3). 


9.6 HOMOLOGY MODELLING 


Proteins with similar primary structures adopt generally sim- 
ilar polypeptide folds. In fact, surprisingly often changes in 
primary structure have little effect on the overall fold of 
polypeptide chains. The redundancy of the PDB is partly 
a reflection of the robustness of protein fold to change 
in primary structure. Although thousands of new three- 
dimensional structures are added to the PDB each year, very 
few of these represent entirely new folds. As a general rule 
of thumb, sequences with a minimum of 50% identity are 
expected to adopt the same fold over 90% of their structure. 
Thus the primary structure ‘encodes’ a three-dimensional 
fold but the precise rules of this code are not as yet fully 
understood and there is considerable plasticity built into 
this relationship. There is major research interest in solving 
the protein folding problem as this could allow us to create 
‘designer proteins’ with novel properties and it could also 
help us to understand more about the increasing number 
of disease states which feature protein mis-folding such as 
Alzheimer’s disease and spongiform encephalopathies. 

There are two main approaches to predicting three- 
dimensional fold from primary structure (Figure 9.30). Ab 
initio modelling involves computing a protein structure by 
estimating the energy of the molecule as it goes through 
various structural perturbations, a process of energy min- 
imization. This requires extensive computer power and is 
limited by our lack of knowledge of the precise details of 
energy interactions within folded proteins and between pro- 
teins and solvent water. Moreover, for most proteins we 
know little about the kinetics of folding which may mean 
that the fold with lowest thermodynamic energy may not 
be kinetically possible. For these reasons, a more empirical 
approach is often taken called homology modelling. This 
predicts three-dimensional structure for a protein of known 
sequence (the target sequence) by modelling it on the known 
three-dimensional structure of a protein of related sequence 
(the template sequence). Generally, this approach is not fea- 
sible below approximately 30% homology. 

A useful parameter in homology modelling is the root 
mean square deviation (rmsd) of the model. 


FE, 2 
rmsd? = 2 Ce = 0 (9.1) 


where xy is the location of n a-carbon atoms in the target 
and Xy is the location in the template. This is a numerical 


reflection of the average deviation of the model from 
template. The rmsd has a value of 0 for identical structures 
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Figure 9.28. Viewing structures with WebMol (a) PDB ID can be entered on-line or downloaded files can be found by browsing. (b) 


Structure of human aquaporin 1 (1J4N). The distance matrix window shown allows measurement of distances between selected residues. 
Similarly, Ramachandran diagrams allow measurement of dihedral angles around specific residues. 
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Figure 9.29. Viewing structures with SwissPdbViewer. (a) Opening the file for human aquaporin 1 (1J4N) produces a view of the protein 
in wireframe. Some selected residues have their van Der Waals radii highlighted in the model. The ‘align’ window is opened from the 
windows pulldown and in this case Args162 is selected. (b) Residues are selected from the control panel which is also opened from windows 
pulldown. For other possibilities with SwissPdbViewer see Figure 9.32 below. 
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Example 9.3 Viewing Aquaporin with Swiss-Pdb Viewer 


Entering the accession code 1j4n into the PDB search box allows download of a structure file for bovine aquaporin 
1 determined to 2.2 A resolution by X-ray diffraction (Sui et al., 2001). Selecting ‘open’ in Swiss-PdbViewer allows 
navigation to and opening of this file. The structure is viewed with backbone rendered in solid 3D (‘Display’ pulldown). 
The view is through the pore and two molecules of the ligand (nonylglucoside) with which the protein was cocrystallized 
are clearly visible. 


Sr LI4N (1045 x 750) 


Selecting residues 39—42 on the control panel allows viewing of the dihedral angles for each of these residues in the 
Ramachandran Plot option of ‘window’ pulldown: 


(Continues)_! 
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M (Continued) 


It is clear that the values are mostly in the allowed region of the plot. 


(in theory) but the best models can have values of up to 
~0.5Å, approximately the variation found between indi- 
vidual structure determinations of a single protein. A higher 
number for rmsd reflects increasing deviation between the 
model and template and suggests that the model should be 
viewed with increasing caution. Because n varies with pro- 
tein size, rmsd values are not directly comparable between 
proteins of differing size but methods for normalizing for 
this effect are available. 


9.6.1 Modelling Proteins from Known 
Homologous Structures 


The first step in homology modelling is to identify struc- 
turally conserved regions (SCRs) in the two proteins by 


sequence alignment. The atomic coordinates for SCRs are 
then copied from the PDB file of the template sequence and 
used to construct a partial model for the target. In general, in 
regions of high sequence identity this gives a good overlap 
between positions of -carbons but positions of side-chains 
and nonconserved residues will differ. Amino acids within 
SCRs differing between the two sequences are now replaced 
with the correct residue taken from a structure library. Gaps 
between the SCRs must now be filled in with loop sequences 
ina process known as loop searching. These gaps are usually 
<12 residues long and represent surface regions of the pro- 
tein. Loop searching uses the geometry at either side of the 
gap to search the PDB for amino acid sequences of similar 
sequence and geometry containing approximately the same 
number of residues as the gap. The atomic coordinates of 
the loop are taken from the PDB and included in the model. 
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Figure 9.30. Modelling three-dimensional protein structures from sequence. Two general approaches are possible. (a) Ab initio modelling 
is based on calculating a low energy structure based on interactions within the protein and between the protein and solvent. (b) Homology 
modelling calculates a structure based on a pre-existing structure in the PDB for a protein (template) showing sequence similarity to the 
target protein. SCRs are found by sequence alignment and coordinates of these are obtained from the PDB. Gap regions are modelled by 
searching the PDB for loops of similar length, geometry and sequence. The coordinates for these are similarly added to the model. The 
sequence of loops is converted to that of the target protein and a final structure calculated by energy minimization. 
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D4 


Example 9.4 Homology Modelling of the Motor Componenent of Dynein Heavy Chain 


Eukaryotic cells have three distinct types of motors: dyneins, myosins and kinesins which allow movement along the 
cytoskeleton in a manner driven by ATP hydrolysis. This type of movement underlies change of shape and is responsible for 
movement of organelles within cells. These proteins are all part of the P-loop ATPase protein family. Although otherwise 
structurally diverse, there is considerable sequence conservation in the part of the motor (‘motor unit’) involved in ATP 
hydrolysis and development of force. Sensitive sequence alignment comparisons suggested similarities between the motor 
unit part of dynein and members of the AAA class of chaperone-like ATPases. From this it was deduced that the core of 
the motor unit consisted of 6 AAA-like motifs each of approx. Mr 35-40 kDa. 

Three AAA template proteins were selected based on sequence alignment and a homology model created as follows: 


A) View shown in ribbon representation with AAA-like motifs labeled D1-D6. B) Spacefill. C) Side-view of B). 


The sequence of the loop now must be corrected to that of 
the target sequence. Loop searching is performed in this way 
for all loops. The model resulting from this process is then 
energy-minimized resulting in a final model. 


9.6.2 Automated Modelling 


The process of homology modelling involves quite time- 
consuming decisions on identification of templates, loop 
searching and varying modelling parameters all of which 
require the intervention of expert knowledge on the part of 
the researcher. A quicker approach to homology modelling 
is to use automated on-line resources which will automat- 
ically align the target sequence on the template in three 
dimensions based on the latter sequence’s known structure 
in the PDB. An important caveat to this approach is that the 
resulting model may contain significant errors precisely be- 
cause all choices are made automatically. Such models must 
therefore be interpreted with caution. Automated modelling 
is either performed with a program accessible on-line, for 
example MODELLER or by accessing a server which takes 
the user through the steps required (Table 9.3). In the latter 
case, the target sequence is entered in a box along with the 
researcher’s e-mail address (Figure 9.31, see colour plate 


section). If modelling is successful, results are e-mailed in 
the form of a file viewable with the programs described 
in Section 9.5 above. The Swiss-PdbViewer is especially 
useful in that it detects imperfections in the model such 
as deviant dihedral angles or clashes between side-chains 
(Figure 9.32). Swiss-PdbViewer can also perform de novo 
modelling of loops, multiple alignment of protein se- 
quences/structures and generation of phylogenic trees. The 
reader is referred to Gale Rhodes’ excellent on-line tutorials 
for more information on these topics (see References). The 
structures and sequences of both the template and target 
are shown in the same windows and options in the select 
pulldown allow identification of clashes, unfavourable bond 
angles and other indications of the model’s quality. On-line 
homology modelling servers carry out a sequence alignment 
to find the most similar sequence in the PDB. The target se- 
quence is then modelled on this template using a program 
such as SWISS-MODEL (ExPASy site), ESyPred3D (Uni- 
versity of Namur, Belgium) and Geno3D (CNRS, Lyons, 
France). The last-named site is especially interactive in that 
the user can view alignments, select various templates and 
model on multiple templates (Figure 9.33). The program 
generates multiple structures and checks their stereochemi- 
cal appropriateness with PROCHECK. 
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Figure 9.32. Using Swiss-PdbViewer to compare homology model to template. Ramachandran diagrams are shown for (a) Human aquaporin 
1 (PDB ID 1J4N) and (b) The model for canine aquaporin generated by ESyPred3D shown in Figure 9.31. Note the overall similarities 
between the range of values occupied and that most of the nonglycine residues outside the three heavily-populated parts of the diagram are 
common to both structures. These similarities, combined with absence of side-chain clashes, suggest that this model is structurally plausible. 
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(a) | Address [Æ] http://geno3d-pbl..bep. fr/cg-bin/geno3d_sutomat. pPpsge=/GENOSD/gena34_home. html 


os Pole Blolaformatique Lyonnais 
Larus “a Geno D] 


3D is the IBCP contnbution to PBIL in Lyon, France 
[HOME] [GENO3D] [HELP] [REFERENCES] (NEWS) [NPS@)] [SuMo] (PBIL} 


hursday, February 17t GENOS3D is now running on Linux cluster (see news 


GENO3D Release 2 : AUTOMATIC MODELING OF PROTEINS THREE-DIMENSIONAL STRUCTURE 


[Abstract] [GENO3D help] [Original server] 
Database : | Non-redundant protein sequences =] 
Sequence | name (optional) : 


Paste protein 1 sequence below : help 


masefkkklf vravvaefla milfvfisig salgfnypvr nnqtagaaqd 
nvkvslafgl siatlaqsvg hisgahinpa vtlglliscq isilravmyi 
iaqcvgaiva tailsgitss Ipdnsigrne lapgvnsgag igieiigtlq 
ivicviattd rrrrdiggsg plaigisval ghllaidytg cginparstg 
ssvithnfkd hvifwvgpfi ggalavliyd filaprssdl tdrvkvetsg 
iqveeyeldgd dinsrvemkp k 


Number of PSI-BLAST run [2 2] 


SUBMIT | CLEAR | 


b) > NPSA 3DSEQ| pdbij4nas| pdbij4na MEMBRANE PROTEIN 
19-OCT-01 1J4N CROSS_PDB = 
Length = 249 


Score = 296 bits (759), Expect = 2e-79 
Identities = 235/249 (94%), Positives = 243/249 (97%) 


Query: 1 MASEFKKKLFWRAVVAEFLAMILFVFISIGSALGFNYPVRNNQOTAGAAQDNVKVSLAFGL 60 
MASEFKKKLFWRAVVAEFLAMILF+FISIGSALGF+YP+++NOT GA QDNVKVSLAFGL 
Sbjct: 1 MASEFKKKLFWRAVVAEFLAMILFIFISIGSALGFHYPIKSNQTTGAVODNVKVSLAFGL 60 


Query: 61 SIATLAQSVGHISGAHLNPAVTLGLLLSCQISILRAVNYIIAQCVGAIVATAILSGITSS 120 
SIATLAQSVGHISGAHLNPAVTLGLLLSCQIS+LRA+NYIIAQCVGAIVATAILSGITSS 
Sbjct: 61 SIATLAQSVGHISGAHLNPAVTLGLLLSCQISVLRAINYIIAQCVGAIVATAILSGITSS 120 


Query: 121 LPDNSLGRNELAPGVNSGOQGLGIEIIGTLQOLVLCVLATTDRRRRDLGGSGPLAIGLSVAL 180 
LPDNSLG N LAPGYNSGQGLGIEIIGTLOLVLCVLATTDRRRRDLGGSGPLAIG SVAL 
Sbjct: 121 LPDNSLGLNALAPGVNSGQGLGIEIIGTLOLVLCVLATTDRRRRDLGGSGPLAIGFSVAL 180 


Query: 181 GHLLAIDYTGCGINPARSFGSSVITHNFKDHUIFWVGPFIGGALAVLIYDFILAPRSSDL 240 
GHLLAIDYTGCGINPARSFGSSVITHNF+DHUIFUVGPFIG ALAVLIYDFILAPRSSDL 
Sbjct: 181 GHLLAIDYTGCGINPARSFGSSVITHNFQDHUIFWVGPFIGAALAVLIYDFILAPRSSDL 240 


Query: 241 TDRVKVUTS 249 
TDRVKVUTS 
Sbjct: 241 TDRVKVUTS 249 


Figure 9.33. On-line homology modelling with Geno3D. (a) The sequence of canine aquaporin is entered in the sequence box and this 
is submitted for PSI-BLAST. (b) A listing of alignments is returned. The user selects preferred template, in this case human aquaporin 1 
which shows 94% identity with the query sequence. (c) Three models were created using a structure human aquaporin 1 (PDB ID 11H5) 
determined by electron diffraction as a template. These were returned by e-mail and viewed with PYMOL. Models have rmsd values in the 
range 0.52-1.05 A compared to the template. (d) Plot of deviation along the chain confirms that the bulk of the deviation in structure is at 
the N- and C-termini. 
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Figure 9.33. (Continued) 
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9.6.3 Applications of Homology Modelling to 
Drug Discovery 


A major application of homology modelling is in ratio- 
nal drug design to create new families of drugs based on 
structure-function knowledge rather than synthesizing and 
screening random variants of molecules known to be bioac- 
tive. This involves docking variants of drug structures into 
binding sites of protein models in silico with the objective 
of selecting candidate structures for synthesis. Not alone 
is a model of the target protein required but the effect of 
interaction of the ligand on the protein structure must also 
be modelled. Commercial programs for accomplishing this 
are available at the CCDC including GOLD (protein-ligand 
docking), SuperStar (predicts protein-ligand interactions us- 
ing experimental data) and Relibase (searches for protein- 
ligand complexes in PDB and proprietary databases). A pro- 
gram called AutoDock written originally by David Goodsell 
and Arthur Olson of the Scripps Institute is downloadable 
from the Software section of the PDB and models docking 
of flexible ligands to proteins. 
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Chapter 10 


Proteomics 


Objectives 


abundance. 


After completing this chapter you should be familiar with: 


e The principal physical methods used in proteomics. 

e What type of questions can proteomics answer? 

e What is the relationship between proteomics and other ‘-omic’ technologies? 

e Chemical tagging methodologies and their use to detect subproteomes. 

e Some methods tell us about abundance while others reveal changes in proteins with no net change in 


Genetic material contains genes many of which encode pro- 
teins. This complement of genes is referred to as the genome 
and can be regarded as a relatively static set of instructions 
(Figure 10.1). When transcribed, these genes direct synthe- 
sis of a dynamic population of mRNA molecules that will 
vary with the requirements of the cell at any given time. 
This is referred to as the transcriptome. When translated, 
the population of mRNAs directs synthesis of a population 
of proteins that can be further varied by post-translational 
processing such as glycosylation or proteolysis. This is the 
proteome, the total complement of proteins expressed in 
a cell at any given point in time and under a given set of 
growth conditions. Like the transcriptome, the proteome is a 
dynamic quantity that can change in response to nutritional 
status, stress and specialization of the cell or with stage 
of the cell cycle. Proteomics is the study of the proteome 
using methodologies capable of analyzing large numbers 
of proteins simultaneously. Three main types of techniques 
have contributed to development of the field of proteomics; 
electrophoresis, mass spectrometry and chip technologies. 
These make possible high-throughput analysis capable of 
revealing alterations in presence or level of individual pro- 
teins. These approaches have contributed to development 
of subspecialties such as descriptive proteomics (catalogu- 
ing of proteins), functional proteomics (focusing on the dy- 
namic state of the proteome) and interaction proteomics 
(exploring protein-protein interactions). This chapter de- 
scribes these technologies and some of their current appli- 
cations. The reader should note that some common chem- 
ical tools/approaches are used across these methodologies 
(e.g. use of reducing and alkylating agents; isotope/fluor- 
labelling; reactions of functional groups with groups on 
stationary phase beads). Traditional proteomics aimed to 


catalogue and quantify relative abundance of proteins but 
increasingly interest is shifting to exploring differences in 
the properties of proteins associated with modest struc- 
tural changes (e.g. phosphorylation, glycosylation and oxi- 
dation). Often, these changes do not alter the relative abun- 
dance of target proteins. It should also be remembered that 
the availability of large-scale sequence data from genome 
projects (Chapter 9) is an essential factor for automatic iden- 
tification of specific proteins within the proteome. 


10.1 ELECTROPHORESIS IN 
PROTEOMICS 


The theory and practice of electrophoresis has already been 
described (Chapter 5). In the 1970s Patrick H. O’ Farrell 
(then at the University of Colorado) combined isoelectric 
focusing (IEF) in one dimension with SDS PAGE in a sec- 
ond dimension to separate protein mixtures into individ- 
ual spots (Sections 5.3 and 5.5). This technique is called 
two-dimensional SDS PAGE (2-D SDS PAGE). However, 
this approach required preliminary formation of a soluble 
pH gradient that tended to migrate during the experiment 
towards the cathode (‘cathodic drift’). It also proved dif- 
ficult to achieve reproducible pH gradients. For these rea- 
sons, 2-D SDS PAGE had limited robustness for general 
use. As mentioned earlier (Section 5.5.1), in the 1980s a 
method was developed for immobilizing pH gradients on 
solid supports such as plastic strips. This made 2-D SDS 
PAGE a reliable and reproducible proposition and it is now 
the principal electrophoresis format used in modern pro- 
teomics (Figure 10.1). Advantages of this method include 
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Figure 10.1. The genome of most organisms consists of DNA. This 
is transcribed into the transcriptome which varies with cell type 
and metabolic status of cells. The proteome is the complement of 
proteins expressed at any given moment in time. The metabolome 
is the population of small molecules expressed and is a reflection 
of the metabolic status of the cell 


that it makes up to thousands of proteins amenable to exami- 
nation in a single experiment and that separated proteins are 
made available for further analysis by procedures such as 
N-terminal microsequencing, MS or immunoblotting. How- 
ever, it should be borne in mind that some categories of 
proteins are often under-represented in 2-D SDS PAGE sep- 
arations including membrane, low abundance and extreme 
pI proteins. Except for the simplest organism, it is therefore 
not yet possible to separate whole proteomes in a single 
experiment. For this reason, we frequently concentrate in- 
stead on subproteomes. These can be selected in various 
ways including structural (e.g. organelles) and functional 
(e.g. affinity-purified, glycoproteins, phosphoproteins etc.). 
Proteomics has harnessed and developed a wide range of 
selection or detection methodologies. 


10.1.1 Two-Dimensional SDS PAGE 


In a simple bacterial cell such as E. coli there are thought to 
be more than 2000 individual proteins while in eukaryotes 
the number may be as high as 100000. Each of these pro- 
teins has a distinct set of physical and structural properties 
that facilitates their separation in electrophoretic systems. 
Although electrophoresis gives high-resolution separation, 
it is usually difficult to distinguish more than 100 or so indi- 
vidual species in a single experiment such as SDS PAGE or 
nondenaturing electrophoresis. For this reason, most elec- 
trophoretic separations are performed on samples that have 
been previously purified in some way (e.g. by chromatog- 
raphy; Chapter 2). We have previously seen how interfac- 
ing of two high-resolution techniques is possible with MS 


analysis (Chapter 4) allowing greater resolution of com- 
plex mixtures than is possible with single techniques alone. 
Combination of the electrophoretic procedures of IEF and 
SDS PAGE in 2-D SDS PAGE facilitates virtually complete 
separation of individual polypeptides even from crude cell 
extracts. 


10.1.2 Basis of 2-D SDS PAGE 


In current usage, IEF separation is carried out in 
commercially-available or laboratory-prepared gel strips 
containing immobilized pH gradients (IPG; Figure 10.2). 
IEF strips have relatively large loading capacity (100- 
200 ug) and result in highly reproducible separation of pro- 
teins. It is possible to prepare strips in the laboratory so as 
to select a specific pH range and commercial strips are also 
available in a wide range of pH intervals and even covering 
very narrow increments. The usual approach is to carry out 
preliminary separations across a broad pH range and then 
to narrow the range so as to improve resolution of individ- 
ual spots. After focusing, the IEF strip is placed along the 
top of an SDS PAGE gel and overlaid to hold it in place 
(Figure 10.3). SDS PAGE electrophoresis is then performed 
orthogonally (i.e. at 90°) to the axis of the IEF dimension. 
When the SDS PAGE gel is stained (e.g. silver staining), 
individual proteins are visible as spots on the gel. Proteins 
separate in the first dimension based on differences in their pI 
values whilst, in the second dimension, they separate based 
mainly due to differences in their molecular mass. It should 
be noted that, since both dimensions are usually carried 
out under denaturing conditions, separation of individual 
polypeptides is achieved in this experiment (Example 10.1). 


10.1.3 Equipment Used in 2-D SDS PAGE 


Commercially available IEF strips containing an immobi- 
lized pH gradient are available in small, intermediate or large 
formats that match the dimensions of SDS PAGE appara- 
tuses. The larger the format, the more spots are resolvable 
but the slower the electrophoretic separation and the more 
protein is required. Thus, small format gels are often used 
initially to find the optimum pH range and larger format 
gels may be used to optimize spot resolution. Strips are fit- 
ted into a commercial IEF apparatus with a built-in power 
supply which facilitates handling of multiple samples under 
fixed temperature conditions (Figure 10.3) and IEF is per- 
formed as described earlier (Section 5.5.2). IEF strips are 
equilibrated in SDS PAGE sample buffer and laid across 
the top of the stacking gel of the SDS PAGE system. The 
glass plate used for the SDS PAGE gel is specially shaped 
to provide a convenient trough into which the IEF strip may 
be placed. The IEF strip is overlaid with warm agarose that 
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Figure 10.2. Isoelectric focusing with (a) free and (b) immobilized ampholytes. Ampholytes act as local buffers. It is necessary first to 
prefocus free ampholytes forming a soluble pH gradient. If ampholytes are covalently attached to a gel network on a plastic strip in order of 
increasing pl, then the gradient is immobilized. In practice a polyacrylamide gel is immobilized on plastic strips and is swollen by addition 


of sample protein solution in 8 M urea 


sets as it cools, fixing the strip in place. While vertical SDS 
PAGE gels are most commonly used, horizontal gels are also 
possible as described in Prof A. Gérg’s on-line manual listed 
in the bibliography. Flat-bed IEF gels may be treated in the 
same manner as gel strips except that individual tracks need 
to be excised with a sharp scalpel and treated as described 
above. 


10.1.4 Analysis of Cell Proteins 


2-D SDS PAGE is based on the reasonable supposition that 
it is unlikely that any two polypeptides will have exactly 
the same pI and molecular mass. Experimentally, proteins 
that appear as a sharp band on either IEF or SDS PAGE 
will appear as spots in 2-D SDS PAGE analysis. In practice, 
it is sometimes difficult to distinguish closely migrating 
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Figure 10.3. 2-D SDS PAGE. Proteins are focused by isoelectric focusing (IEF) in an immobilized pH gradient on a plastic strip. In this 
case the pH range is 5-8 but other ranges are available. The strip is then soaked in SDS buffer and applied to an SDS PAGE gel. After 


staining, proteins appear as ‘spots’ 


spots completely but this technique certainly resolves the 
majority of polypeptides. Because of this, it has been used 
widely for comparison of complex protein mixtures from 
crude cell extracts. For example, it is possible to compare 
extracts of deletion-mutant bacteria to those of correspond- 
ing wild-type cells and to identify individual spots absent in 
the mutant though present in the wild-type. It is also possi- 
ble to look at a single cell-type at different stages of the cell 
cycle to identify polypeptides only expressed at particular 
stages of the cycle. 

A feature of 2-D SDS PAGE analysis is that proteins are 
detectable across three or four orders of magnitude in a sin- 
gle mixture. In order to access this quantitative information 
image analysis systems are used. These convert the image of 
the stained gel into digital information suitable for storage 
and comparison of separations (Figure 10.4). In practice, it 
is found that some proteins are expressed very abundantly 
in cells giving large, densely-staining spots while other pro- 
teins may be barely detectable, even with silver staining. 
For meaningful results, it is usually necessary to run repli- 
cate gels and to use these to generate an ‘average gel’ for 


test and control separations. These separations are inspected 
carefully to identify nonsymmetric, overlapping or saturated 
spots. In comparing test to controls, either commercially 
available internal standards or spots common to the two 
separations can be used to adjust for loading differences. 
Alternatively, prior to separation, test and control mixtures 
can be labelled with distinct fluors (Chapter 3). These are 
then mixed and run on a single gel in a process called differ- 
ence gel electrophoresis (DIGE) (Figure 10.5). Excitation at 
different wavelengths allows us to distinguish between test 
and control samples and fluorescence intensities are used 
to quantify specific proteins. Because the data from 2-D 
SDS PAGE are digitized by image analysis systems, they 
can be stored electronically and used to create a database 
for specific cell types. These databases annotate spots with 
their (Mr, pI) and identity (if known). Such databases can 
be made more widely available to the international scientific 
community through the Internet (see link to such databases 
at: http://biohttp://www.net/detail-364.html). 

A key activity in proteomics is identification of individual 
proteins. Usually, spots of interest can be excised, digested 
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Example 10.1 Use of 2-D SDS PAGE to identify proteins affected by fluoride toxicity in rat kidney 


Fluoride is often added in small quantities to drinking water to help prevent tooth decay. However, in larger doses, 
fluoride is known to be toxic to cells. Kidney is the main organ through which fluoride is detoxified and extensive fluoride 
intoxication is know to cause lesions in kidney. These workers exposed rats to a reasonably high dose of fluoride and 
then used 2-D SDS PAGE combined with peptide mapping and MS methods to identify changes in levels of proteins. 
After 2-D SDS PAGE separations, gels were stained with a dye called Coomassie Brilliant Blue (more sensitive staining 
is possible with silver). The patterns obtained are shown below. 


- FLUORIDE + FLUORIDE 


PHO ——-___ pi Range ——— pH PH10 


pH Range —— pH 


Spots were quantified using image analysis software and it was discovered that 141 spots were at least twofold up- 
regulated, 8 decreased at least twofold and 280 were unchanged. Of these 12 up-regulated and 2-down-regulated proteins 
were identified by MALDI-TOF MS. Several up-regulated proteins were associated with intermediary metabolism such 
as the urea cycle, fatty acid and amino acid metabolism. 


Reproduced from Xu, H., Hu, L.-S., Chang, M., Jing, L., Zhang, X.-Y. and Li, G.S. (2005) Proteomic analysis of kidney in fluoride-treated rat. Toxicology 


Letters, 160, 69-75. Copyright ©) 2005, Elsevier. 


with trypsin and tryptic fragments analysed by tandem MS 
Robotic workstations are now available which largely auto- 
mate this activity to allow high-throughput analysis of the 
proteome. In organisms for which genomes are available 
(e.g. human, yeast, rat, etc.) tryptic fragments are automati- 
cally matched to known proteins. Even in species for which 
full genomes are as yet unavailable, individual sequences 
sometimes exist in databases or else there is sufficient se- 
quence conservation for a proportion of tryptic peptides to 
identify a plausible source for the pattern obtained. Another 
approach to spot identification is to use western blotting 
with immunodetection of specific spots (Section 5.10.2). 


10.1.5 Free Flow Electrophoresis 


In introducing electrophoresis (Chapter 5) it was pointed out 
that most modern electrophoresis procedures are performed 
in some type of gel matrix. Free flow electrophoresis is an 


exception to this as the electrophoresis is performed in a 
gel-free context. The principal of free flow electrophore- 
sis is that proteins are allowed to migrate in an electric 
field across which a continuous laminar flow of buffer solu- 
tions is directed (Figure 10.6). The directions of the electric 
field and buffer flow are orthogonal (i.e. at right angles) to 
each other. The buffer is collected in the 96 capillaries of 
a collection plate at the end of a separation chamber, al- 
lowing continuous fractionation of complex mixtures. By 
varying the composition of the separation buffer, three dis- 
tinct separation principles can be used: IEF (Section 5.7); 
zone electrophoresis (Section 5.1.2); isotachophoresis (Sec- 
tion 5.9.1). In contrast to most other electrophoresis formats, 
in free flow electrophoresis sample is loaded continuously. 
This means small samples (~50 ul) can be loaded in a few 
minutes while large samples (~100 ml) can be loaded in a 
day. The technique can be used to separate peptides, proteins 
organelles and even whole cells and it is possible to vary the 
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Figure 10.4. Image analysis in 2-D SDS PAGE. Spot intensities 
can be converted into a digital image from which quantitative in- 
formation can be obtained. In this representation, individual spots 
are represented as three dimensional peaks. Care must be taken 
to ensure spots are not ‘saturated’ (i.e. with flat tops leading to 
underestimation of protein), are not overlapping and are symmetric 


buffer composition to create denaturing/nondenaturing con- 
ditions suitable for the separation principle being used. In 
the context of proteomics, free flow electrophoresis allows 
prior fractionation of complex samples which can then be 
further fractionated or analysed, for example by reversed 
phase HPLC or tandem MS. 
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Figure 10.5. Difference gel electrophoresis (DIGE). Two samples 
(test and control) to be compared are labelled separately with dis- 
tinct fluorescent dyes (e.g. green and red). They are then mixed and 
separated on a single 2-D SDS PAGE gel. Intensity of spots shows 
if they are equal in both samples (yellow), up-regulated (green) or 
down-regulated (red). A colour version of this figure appears in the 
plate section of this book 
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Figure 10.6. Free flow electrophoresis. Sample is loaded at the 
sample inlet and electrophoreses through the separation buffer. Indi- 
vidual components separate as shown because they are differentially 
affected by the electric field. They are collected in 96 capillaries 
at the end of the separation chamber before passing to a fraction 
collector. By varying the composition of the buffer, sample loading 
and electrophoresis variables, conditions can be chosen to separate 
most sample components from each other 


10.1.6 Blue Native Gel Electrophoresis 


Some 25% of eukaryote proteins are membrane-bound and 
some of these precipitate under the conditions used in 2-D 
SDS PAGE leading to their under-representation. Moreover, 
many of these proteins are associated with other proteins in 
multi-protein complexes such as, for example, the mito- 
chondrial electron transport chain. Revealing such protein- 
protein interactions is a key target of proteomics called inter- 
action proteomics. Blue native gel electrophoresis separates 
individual protein complexes from each other in a nondena- 
turing first dimension followed by separation of individual 
protein components of complexes under denaturing condi- 
tions in the second dimension (Figure 10.7). The first elec- 
trophoresis dimension consists of a gradient polyacrylamide 
gel (e.g. 5-12% acrylamide). Before loading samples in the 
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Figure 10.7. Blue native gel electrophoresis. (a) Mixtures of pro- 
tein complexes are first separated under nondenaturing conditions. 
(b) A track excised from the first dimension is overlaid with agarose 
under denaturing conditions and electrophoresed through a second 
gel, separating protein components of each complex. Four com- 
plexes (1-4) are shown 


wells of the gel, they are mixed with Coomassie Brilliant 
Blue G-250 (hence ‘blue native gel electrophoresis’). Some- 
what like SDS in conventional SDS PAGE, this dye confers 
negative charges on all proteins but, unlike SDS, the dye 
is not denaturing. After separation, gel sections containing 
visible protein complexes may be excised for second dimen- 
sion analysis. Alternatively, intact electrophoresis lanes may 
be excised from the gradient gel. These are laid on top of an 
SDS PAGE gel and overlaid with hot agarose solution con- 
taining SDS and 2-mercaptoethanol. When the agarose sets, 
denaturing electrophoresis is performed which separates the 
individual proteins in complexes. Separated proteins can be 
quantified and identified with Coomassie Brilliant Blue or 
by immunoblotting. 


10.1.7 Other Electrophoresis Methods Used 

in Proteomics 

While 2-D SDS PAGE is the principal electrophoresis for- 
mat used in proteomics, capillary electrophoresis (CE) has 
also found application. This technique has been described 
in detail in Section 5.9. Because of the loading limitations 
of CE, it is largely an analytical technique and lacks the 
preparative dimension of 2-D SDS PAGE. 


10.2 MASS SPECTROMETRY 
IN PROTEOMICS 


The physical basis and some applications of MS have been 
described (Chapter 4). Availability of ionization methods 
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such as ESI and MALDI has made MS of proteins and 
peptides an increasingly routine activity. The importance 
of this was recognized by the award of the 2002 No- 
bel Prize in Chemistry to John Fenn and Koichi Tanaka 
for their development of ‘soft’ ionization methodologies. 
Use of MS to identify electrophoretically-separated pro- 
teins has already been described (Section 10.1.4). However, 
quantitative data can be obtained from MS separations 
using tagging methodologies to compare tryptic pep- 
tides from paired samples (e.g. test and control; Exam- 
ple 10.2). Because MS methodologies do not yet allow 
discrimination of every protein in the proteome, selection 
methodologies are necessary to select subproteomes for 
analysis. 


10.2.1 Tagging Methodologies Used 
in MS Proteomics 


Digestion with trypsin usually generates dozens of peptides 
per protein. Thus, the complete human genome could poten- 
tially generate ~900 000 individual peptides (Table 10.1). 
In principle, it ought to be possible to compare this popu- 
lation of peptides from paired samples so as to provide a 
complete proteomic comparison of relative abundance. In 
practice, this approach results in the dataset being domi- 
nated by a large number of redundant peptides with con- 
sequent loss of sensitivity especially for low-abundance 
proteins. One way of avoiding this is to select from the pro- 
teome using affinity methods. Thus, IMAC (Section 2.4.6) 
can be used to select for histidine-containing peptides, thiol 
exchange chromatography selects for cysteine-containing 
peptides and lectin affinity chromatography selects glycosy- 
lated peptides. Alternatively, proteins can be labelled with a 
chemically identical tag. The samples could be test/control 
and the labels can be designed so as to be specific for in- 
dividual amino acid side-chains or other structural features 
(Figure 10.8). The most common experiment of this type is 
stable isotope tagging. Proteins are labelled with a stable 
isotope affinity tag that is of two types. One tag is ‘light’ 
while the other is ‘heavy’ (Figure 10.8). Since MS detects 
differences in mass, identical peptides will appear as pairs 
separated exactly by the mass difference of the two types of 
tag. Proteins from both samples are enzymatically digested, 
labelled and mixed. Samples are automatically separated by 
C-18 reversed phase HPLC (Section 2.6) coupled to tandem 
MS (Section 4.3.1). This allows comparison of test to con- 
trol in a single separation and comparison of corresponding 
light and heavy tagged peptides allows accurate quantifica- 
tion of relative abundance in the starting proteome. As with 
2-D SDS PAGE, some proteins have identical abundance in 
the test and control samples whilst others are up- or down- 
regulated. 
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Example 10.2 Isotope-coded affinity labels in quantification of lysosomal proteins during in vitro 
differentiation 


A human cell line HL-60 is widely used as an in vitro model for cell differentiation as it can be induced to convert into a 
morphologically-distinct monocyte-like state on exposure to 12-phorbol 13-myristate acetate (PMA). The differentiated 
cell has increased cell adherence so it was of interest to study the population of proteins expressed in membrane 
compartments including microsomes. Control cells and those treated with PMA were used as a source of microsomal 
fraction. Proteins were tagged either with the isotopically light (d0) or heavy (d8) variant of the ICAT label shown in 
Figure 10.8, mixed and digested with trypsin. After cation exchange chromatography, labeled peptides were selected by 
avidin affinity chromatography and separated by reversed phase microcapillary HPLC followed by online electrospray 


tandem MS. The results are illustrated; 


QUANTIFICATION 


Proteins labeled 1 and 2 in C were identified in the sequence databases using the program SEQUEST and quantified by 
determination of ratio of light:heavy isotopes with the program XPRESS as follows: 
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In this way 491 proteins could be identified and their abundance ratios determined. 


Reproduced from Han, D.K., Eng, J., Zhou, H. and Aebersold, R. (2001) Quantitative profiling of differentiation-induced microsomal proteins using 
isotope-coded affinity tags and mass spectrometry. Nature Biotechnology, 19, 946-951. Copyright © 2001, Elsevier. 


Table 10.1. Calculated number of tryptic peptides with amino 
acids targeted by tagging and percentage coverage of the human 
proteome 


% total % genome 

Peptides Total number peptides Coverage 
Total tryptic 892584 100 100 

Peptides“ 
N-/C-terminal 52816 5.9 100 
Cys-containing 237111 26.6 96.1 
Met-containing 227773 25.5 98.9 
Trp-containing 157 538 17.6 91.5 
His-containing 274576 30.8 973 
His/Cys- 95238 10.7 73.2 

containing 


“Peptides end in Arg or Lys but are not followed by Pro. (Adapted 
from Zhang et al., 2004). 


10.2.2 Isotope-Coded Affinity Tagging (ICAT) for 
Cysteine-Containing Proteins 


Stable isotope tagging has most commonly been used to 
label cysteine side-chains of proteins. This selects approx- 
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imately 26% of tryptic peptides from the human proteome 
while covering 96% of total proteins (Table 10.1). As only 
cysteine-containing peptides are analysed, more proteins 
can be identified in a given analysis time than if all tryp- 
tic peptides were analysed. The traditional tag used in this 
experiment has three components (Figure 10.8). One part 
contains a thiol-specific chemical group such as iodoacetic 
acid/iodoacetamide. This is connected to a linker that can 
either contain a ‘heavy’ isotope (e.g. deuterium) or a ‘light’ 
isotope (e.g. hydrogen). The third component is biotin which 
can bind to avidin. Such tags are called isotope-coded affin- 
ity tags (ICAT) as the isotope identity allows coding of the 
two samples (e.g. as test and control) while biotin allows 
affinity selection for labeled proteins. Samples are exposed 
to ICATs for long enough to label all -SH groups. The two 
extracts (one now labelled with the light tag and the other 
with the heavy) are then mixed and digested with trypsin. 
Cysteine-containing peptides are selected with avidin and 
analysed by LC-MS-MS Some limitations associated with 
the traditional tag shown in Figure 10.8 are that heavy/light 
peptides may not have identical retention on HPLC, 
some nontagged peptides can bind to avidin nonspecifi- 
cally and the biotin moiety makes labelled peptides quite 
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Figure 10.8. Isotope-coded affinity tag (ICAT). (a) ICATs contain moieties for affinity selection (e.g. biotin specifically binds avidin), 
isotope-coding (e.g. a linker where isotope X can be added) and targeting of protein groups (e.g. a thiol-reactive iodoacetamide). The 
isotope-coded region can contain light (e.g. hydrogen) or heavy (e.g. deuterium) isotopes at X. (b) Test and control tryptic digests are 
separately ICAT labelled and mixed. Samples are affinity selected and run on MS. Incremental differences of 8AMU allow matching of 
corresponding test/control peptides. Simplified examples of a single peptide present in equal amounts, up-regulated in test and down-regulated 
in test are illustrated. The m/z difference between pairs of peptides is given by the difference in mass of 8X as shown 
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Figure 10.9. Solid phase cysteine tags. These contain groups which react with cysteine thiols as well as a point where the tag can be cleaved 
from a solid phase for release of captured peptides. (a) Iodoacetamide-based thiol tag with photocleavage site. (b) N-Ethyl maleimide based 
tag with acid-labile site. (c) Iodoacetamide-based tag with acid-labile site 


hydrophobic with the result that they elute in a narrow part of 
the LC chromatogram. More recent variants of this approach 
have employed a solid phase capture and release process for 
trapping cysteine-containing peptides (Figure 10.9). Some 
of these tags have cleavable groups in the linker which facil- 
itate release of target peptides after other peptides have been 
efficiently washed away. Acid treatment shortens tags be- 
fore LC-MS-MS resulting in reduced collision energy loss 
during ionization with associated improvement in sensitiv- 
ity. However, acid treatment (50% trifluoroacetic acid) may 
be so severe as to affect acid-labile post-translational mod- 
ifications. A selection of smaller cysteine-specific tags is 
shown in Figure 10.10. 


10.2.3 Tagging of N- and C-Termini 


Other features of protein structure have been selected for 
tagging (Table 10.1). The N-terminus can be labelled with N- 
acetoxysuccinimide or succinic anhydride that are available 
in ‘light’ or deuterated forms (Figure 10.11). Thus a popula- 
tion of peptides arising from tryptic digestion provides many 
sites for tagging. Tags can also react with other primary 
amines present in the protein (e.g. lysine and arginine side- 
chains) which may complicate the peptide pattern some- 


what. However, by selecting appropriate reaction conditions, 
it may be possible to exclude these side-reactions. For exam- 
ple, N-terminal amino groups react 100-fold more strongly 
with nondeuterated/pentadeuterated phenyl isocyanate (Fig- 
ure 10.11) than do lysine e-amino groups at pH 8.0. 
Labelling of C-termini can be achieved by converting 
-COOH groups into methyl esters as shown in Figure 10.12. 
Methanol containing three hydrogen or deuterium atoms 
provides the methyl group. As with N-terminal labeling, this 
reaction will also tag aspartate and glutamate residues. An- 
other widely used C-terminal tagging method that does not 
label these residues is to introduce '8O into the C-terminal 
group formed during tryptic digestion (Figure 10.12). One 
protein mixture is digested in H2!8O while the other is di- 
gested in HO. Either one or two oxygen atoms become in- 
corporated into the C-terminus. It is even possible to achieve 
this effect by isotope exchange after tryptic digestion. This 
is done by incubating previously-digested proteins in either 
H,0 or H3 '80 in the presence of trypsin. !8O exchanges with 
160 under these conditions. Drawbacks to these C-terminal 
labeling approaches are the high cost of isotope-labeled 
water and the possible overlap of isotopic distributions of 
light and heavy forms of peptides due to incomplete incor- 
poration of '80, N- and C-terminal tagging are not mutually 
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Figure 10.10. Small thiol-specific tags for differential isotope 
coding (X = hydrogen or deuterium). (a) Acrylamide (b) 2- 
Vinylpyridine (c) N-substituted maleimides (R = CX3 or CX3CX2) 
(d) N-Substituted iodoacetamides (R = CX3CX»2 or C(CX3) 3) 
(e) Iodoacetanilide 


exclusive so it is possible to label peptides at both N- and 
C-terminal ends. This ensures that all peptides in a sample 
can be labeled even if the N- or C-termini of some of them 
are modified. 


10.2.4 Tagging for Tandem MS 


The tagging approaches described above (Sections 10.2.1- 
10.2.3) are mainly aimed at affinity labelling and compar- 
ative quantification. It may also be desirable to label pep- 
tides so as to improve MS sequencing especially where 
only small amounts of protein are available (e.g. spots ex- 
cised from 2-D SDS PAGE). This may be done by in- 
troduction of permanent negative or positive charges in 
such a way as to alter the distribution of a, b or y-ions 
(Figure 4.19). A negative charge may be introduced to N- 
terminal ions by sulphonation (Figure 10.13). This sup- 
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presses the N-terminal fragment ions so that only y-ions 
are observed, thus simplifying MALDI spectra. It is neces- 
sary to modify lysine residues (e.g. by guanidination) prior 
to sulphonation. A modification of this approach is to guani- 
dinate N-termini in 2-D SDS PAGE gels that can later be 
tryptic digested and sulphonated. Conversion of lysine to ho- 
moarginine means trypsin does not cleave at these sites but 
this approach nonetheless allows sequencing by detection 
of y-ions by MALDI MS Phosphorous-containing reagents 
also introduce positive charge as phosphonium groups at the 
N-terminus using reagents such as phosphines and phospho- 
nium salts (Figure 10.14). This causes preferential formation 
of N-terminal a- and b-ions which makes sequencing from 
MALDI fragmentation patterns possible (Figure 4.19). 


10.3 CHIP TECHNOLOGIES 
IN PROTEOMICS 


A third innovation feeding into the growing field of pro- 
teomics is chip technology. High density chips make it pos- 
sible to immobilize thousands of genes or proteins on a solid 
surface which can be read by high-throughput instrumenta- 
tion. This forms the basis of microarrays which can be used 
to screen for gene expression in functional genomics. This 
technology is now well-advanced and representative arrays 
are available for certain cell types or groups of proteins. 
These arrays may represent thousands of individual genes 
and provide a tool with which it is possible to interrogate 
the population of mRNAs in a cell or tissue of interest in a 
single experiment. The readout from the experiment is iden- 
tification of unchanged, undetected, up- or down-regulated 
genes in the sample. Based on success with microarrays, 
there is increasing interest in applying similar technology to 
proteins. Protein chips are increasingly important tools in 
interaction proteomics. 


10.3.1 Microarrays 


Microarrays are sometimes also called DNA arrays, DNA 
chips or gene chips. Typically, a microarray is a glass slide 
such as a wafer of quartz on which DNA molecules are 
attached at fixed locations (spots; Figure 10.15). This is 
achieved by a number of methodologies including print- 
ing by a robot, photolithography or inkjet printing. Each 
spot contains a number of identical molecules of a single 
DNA varying in length from 20 to hundreds of base pairs. 
They can originate from genome-wide oligonucleotide 
libraries of uniform length (25 bp) or else from cDNA li- 
braries. Slightly different results can be obtained from each 
since oligonucleotide-based arrays provide great specificity 
as aresult of uniform hybridization conditions conferred by 
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Figure 10.11. Isotope-coded tags for labelling peptide N-termini and lysine -amino groups. (X = hydrogen or deuterium. (a) Phenyl 
isocyanate (b) N-acetoxysuccinimide (c) Succinic anhydride (d) N-nicotinoxyloxysuccinimide 
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Figure 10.12. Isotope-coded tagging of peptide C-termini. (a) Conversion of -COOH into a methyl ester in methanol (X = hydrogen or 
tritium). (b) Digestion with trypsin in water with differing oxygen isotopes introduces one or two isotope atoms at the C-termini of peptides 
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Figure 10.13. N-Terminal sulphonation. (a) Tryptic digests produce C-terminal lysines and arginines. Lysine can be converted to homoargi- 
nine by guanidinylation. (b) N-terminal amino groups react with an N-hydroxysuccinimide ester of 3-sulphopropionic acid. (c) C-terminus 
has positive charge due to guanidinylation. Fragmentation by postsource decay MALDI produces a y-ion from each peptide which is detected 
and an uncharged b-ion which is not. Mass differences between adjacent y-ions allow calculation of sequence 
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Figure 10.14. N-terminal phosphination. Tryptic peptides can be acetylated at the N-terminus with the N-hydroxysuccinimide ester of 
N-tris(2,4,6-trimethoxyphenyl)phosphine to confer a permanent positive charge on the N-terminus. On fragmentation by MALDI, mainly 
a-type ions are generated. Incremental mass differences between adjacent a-ions give the amino acid sequence 
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Figure 10.15. DNA Microarrays. DNA molecules are covalently 
attached at specific spots on a quartz wafer. (a) Oligonucleotide- 
based arrays consist of DNA of equal length. (b) cDNA arrays con- 
tain DNA of variable length. A real microarray contains thousands 
of spots 


identical length. In cases where alternatively spliced tran- 
scripts are common (e.g. human mRNA), each transcript is 
recognized by a unique oligonucleotide. By contrast, such 
multiple transcripts will hybridize to a single cDNA. 

Ideally, each spot represents one gene or one exon of 
a gene but this is not always possible in practice due to 
the presence of gene families consisting of closely related 
sequences. Microarrays of the entire set of ~6000 yeast 
genes have been available since the late 1990s but most 
microarrays of human genes contain a subset of the human 
genome. A common use of microarrays is to measure relative 
levels of gene expression in two sets of closely related cells 
(e.g. test and control). This is done by extracting mRNA 
from cells to give a reflection of its transcriptome. Since 
mRNA transcription precedes appearance of a new protein, 
there is a loose correlation between levels of mRNA and its 
encoded protein. This correlation is not absolute, however, 
as mRNAs differ greatly in their stability and proteins are 
turned over at differing individual rates. 

The first step is to extract mRNA and to use this directly 
or else to generate synthesis of a cDNA with reverse tran- 
scriptase (Figure 10.16). The mRNA/cDNA population from 


each cell type is then labelled with a different fluorescent dye 
(e.g. green for control and red for test). The two populations 
are washed over the microarray and allowed to hybridize 
to the DNA printed on the spots. If a gene is preferentially 
expressed in the test sample, the spot will glow red when 
illuminated with a laser beam. If a gene is preferentially ex- 
pressed in the control sample (due to down-regulation), the 
spot will glow green. If a gene is equally expressed in both 
test and control, the spot will glow yellow. Image analysis 
software provides readout of this experiment in terms of spot 
identification and quantification. Microarrays usually come 
with built-in blank spots, control spots and at least duplicate 
spots for each gene. By comparing known levels of genes on 
a number of microarrays with observed spot intensities for 
genes under investigation it is possible to generate a gene 
expression matrix for the cells under investigation which 
can be related to results from other experiments. 


10.3.2 Protein Biochips 


Although not yet as well-established, it is possible to immo- 
bilize proteins in an array format conceptually similar to that 
of microarrays in such a way that they can recognize and 
bind protein components of complex biological mixtures 
(Figure 10.17). One way of doing this is to express individual 
recombinant proteins as cDNA clones in each spot. An array 
containing a set of structurally diverse antigens can be used 
to screen serum of patients with autoimmune disease (e.g. lu- 
pus erythematosus) allowing identification of autoantibody 
patterns that may vary from patient to patient and with iden- 
tity of the autoimmune disease. Other applications that take 
advantage of the biospecific recognition properties of pro- 
teins include protein-ligand and enzyme-substrate arrays. 
Examples of protein-ligand recognition include an array of 
~6000 yeast recombinant proteins in identification of novel 
calmodulin-binding proteins and an array of 37000 human 
recombinant proteins expressed in Escherichia coli in which 
13 proteins were identified as binding a biotin-labelled pep- 
tide (Example 10.3). Enzyme-substrate recognition is ex- 
emplified by arrays of protein kinases, phosphatases, per- 
oxidases and restriction enzymes. This complements use of 
activity-based chemical probes for enzyme classes in MS 
(Section 10.4.7). 


10.3.3 SELDI-TOF MS on Protein Chips 


Protein biochips comprise an array of proteins with specific 
binding properties. An alternative use of protein chips is as 
a derivatized surface with distinct chromatographic features 
including hydrophobic, hydrophilic, affinity, reversed phase, 
ion exchange and IMAC groups (Figure 10.18). This allows 
fractionation of the proteome so as to overcome limitations 
of traditional MALDI-TOF MS which can only handle a 
relatively limited number of proteins in a single analysis. 
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Figure 10.16. Detecting gene expression with DNA Microarrays. mRNA is isolated from matched test and control cells and converted to 
cDNA by reverse transcriptase. Each cDNA population is tagged with a fluorescent dye (red for test, green for control) and washed over 
the microarray. cDNA hybridises to its complementary gene. Un-hybridized cDNA is removed by washing. Scanning the microarray with 
a laser reveals up-regulated (green), down-regulated (red) and equal gene expression (yellow) at each spot of the microarray. Spots are in 
duplicate and the array contains both positive and negative controls. A colour version of this figure appears in the plate section of this book 
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Figure 10.17. Protein arrays. There are two general types of protein arrays. (a) Analytical array for detection of antigens in serum. Multiple 
antibodies are attached at spots on array. These can bind specific fluorescently labelled antigens in biological fluid (e.g. serum). In this example 
antigens a and d are detected. (b) Functional protein arrays are based on bioactivity of proteins such as receptor-ligand, enzyme-substrate 
recognition 
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Example 10.3 Use of activity based probes in profiling rat testis proteomes 


Activity-based probes (ABPs) are designed based on known recognition of reactive groups within the probe by the 
active sites of particularly well-understood enzyme classes (e.g. serine hydrolases). Nondirected probes may be able to 
recognize less well-understood enzyme families and these feature an additional binding group. A family or library of 
such probes for serine hydrolases was synthesized based on a sulphonate reactive group with a range of binding groups 
as shown below: 


LABEL/TAG 


These probes were used to analyse testis proteomes. The common biotin moiety allows capture of targeted proteins 
followed by SDS PAGE. (a) Probes 1-7, (b) Probes 8-11. Pre-boiling of extracts (denoted by A) abolishes much of 
the recognition. Notice that many of the bands recognized are common to several probes and that each probe has its 
individual specificity. 
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Reproduced from Adam, G.C., Cravatt, B.F. and Sorensen, E.J. (2001) Profiling the specific reactivity of the proteome with nondirected activity-based 
probes. Chemistry and Biology, 8, 81-95. Copyright ©) 2001, Elsevier. 
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Figure 10.18. SELDI MS. This is distinguished from MALDI MS by the use of protein chips in a range of formats based on chromatograpic 
interaction. These select subproteomes which can be ionized by laser desorption and analysed by MS. Output of the experiment is presented 
as an MS spectrum or as if the separation was performed in an electrophoresis gel 
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This approach has been commercialized (Ciphergen Biosys- 
tems Inc., Fremont, California, USA). An energy-absorbing 
matrix is added to the ProteinChip® so that, when laser 
energy is applied to the sample, the proteins present become 
ionized and they may be analysed by MS. The ionization 
step is called surface enhanced laser desorption ionization 
(SELDI) and detection is in a TOF detector (Section 4.1.4). 
This enables high-throughput, rapid and sensitive analysis 
of complex biological mixtures. 


10.4 POST-TRANSLATIONAL 
MODIFICATION PROTEOMICS 


In simple bacteria post-translational modifications occur in 
approximately 25% of gene products. In vertebrates, as 
many as 3—4 times the number of unique proteins is pos- 
sible from a given number of genes. This arises from pro- 
cesses such as subunit hybridization and alternative splicing. 
However, a relatively large number of eukaryotic proteins 
are also processed by post-translational modification (for 
examples of modifications see Table 4.3). This provides 


(a) a 


extensive options for controlling biological properties of 
proteins allowing greater complexity to be achieved with a 
given genome. They also provide a means for cells to adapt 
to changed circumstances such as altered nutrition status 
or exposure to oxidative stress. Phosphorylation in signal 
transduction cascades and the conferring of recognition on 
cell-surface proteins by specific glycosylation provide good 
examples of the biological importance of post-translational 
modifications. Elucidation of these structural changes has 
the potential to give insights into fundamental cellular pro- 
cesses and to events in the proteome. 


10.4.1 Proteolysis 


Because proteolysis produces smaller products of usually 
distinct mass and with altered physicochemical proper- 
ties compared to the substrate protein, these products are 
readily detectable in 2-D SDS PAGE or MS analyses. 
Zymograms (Section 5.2.4 and Example 5.2) provide a 
means of identifying the presence of proteases in elec- 
trophoresis gels which can have diagnostic value. Affin- 
ity probes for specific classes of enzymes have been de- 
veloped for proteomic analysis (Figure 10.19). Examples 


TAG 


Figure 10.19. Activity-based protein profiling. (a) Cysteine proteases. (b) Serine hydrolases (including serine proteases) 
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Figure 10.20. Activity-based probes. Activity based probes consist of a reactive group which reacts covalently with the enzyme active 
site, a linker (which contributes to selectivity) and a label or affinity tag which facilitates detection or capture of affinity-labelled proteins. 


(Adapted from Jeffery and Boygo, 2003) 


include inhibitors of cysteine and serine proteases and com- 
ponents of the 20S proteasome. These agents allow com- 
parative profiling of proteomes in which the target family 
of proteases is active/inactive. This is an example of activ- 
ity based profiling in which specific enzyme classes in the 
proteome are selectively targeted by small molecules that 
bind to their active site. Activity probes have three com- 
ponents: a reactive group (which reacts with the enzyme 
active site), a linker and fluorescent dye (which facilitates 
detection) (Figure 10.20). Directed activity probes have a 
well-characterized reactive group which targets a specific 
enzyme mechanism. In addition to proteases, such probes 
have been developed for tyrosine phosphatases and glycosi- 
dases. Non-directed probes have an additional binding group 
attached to the reactive group. Various binding groups can 
be used so a family of probes is generated. In the family of 
sulphonate esters described in Example 10.3, a nucleophile 


in the enzyme active site can displace the ester and accept 
the linker which remains covalently attached to biotin (Fig- 
ure 10.21). In this way, proteases with similar functions can 
be identified in the proteome. 


10.4.2 


Glycosylation is a key modification of proteins which is 
clinically very relevant in processes such as cell transfor- 
mation and tumorigenesis. It frequently confers biospecific 
recognition on proteins especially when expressed on cell 
surfaces. Two main points of attachment to proteins com- 
monly occur: N-linkage at asparagines in —Asn-X-Ser/Thr- 
(X is any residue except proline); O-linkage at serine or 
threonine (Figure 10.22). N-linked glycosylation sites are 
found in proteins expressed on the extracellular side of 
the cell membrane, in secreted proteins and in proteins of 
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Figure 10.22. Glycosylation sites in glycoproteins. (a) N-Linkage at Asn-X-Ser/Thr. X is any amino acid except proline. (b) O-Linkage at 
serine or threonine 
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Figure 10.23. Oxidation of glycoproteins. The cis-diol groups of monosaccharide components of glycoproteins are conveniently oxidized 
by periodic acid to form aldehyde groups. These react with either Schiff’s reagent or hydrazines to attach a dye or bead as a Schiff’s base or 
hydrazone derivative, respectively. Schiff’s reagent is used to stain glycoproteins in gels or after transfer to blotting membranes. Hydrazine 
capture is used to isolate glycoproteins for MS analysis (see Figure 10.24) 


bodily fluids. There is scope not just for modification per 
se but also for complex patterns of glycan branching which 
presents a major analytical challenge. In 2-D SDS PAGE 
separations, the principal methods for selectively and sensi- 
tively labelling glycoproteins depend on periodic acid Schiff 
base chemistry (Figure 10.23). After 2-D SDS PAGE, pro- 
teins transferred to PVDF membranes (Section 5.11.5) can 
be visualized with fluorescent dyes. Unlike other categories 
of biopolymers (e.g. proteins, lipids), the identity, sequence 
and arrangement of monosaccharide building blocks in gly- 
coproteins are difficult to analyse. Yet the biological prop- 
erties of glycoproteins depend strongly on these chemical 
details. 


MS methods combined with LC separation have been 
developed for proteomic comparisons of glycoproteins. In 
the case of N-glycosylation, these depend on two general 
types of affinity preselection (Figure 10.24). The first ex- 
ploits the affinity of glycoproteins for plant lectins which 
can be immobilized on a stationary phase (Section 2.4.5). 
Once captured by the lectins, glycoproteins can be eluted, 
digested with trypsin and reapplied to the lectin column. 
This allows specific removal of nonglycosylated peptides 
from the digest. Glycopeptides can be enzymatically degly- 
cosylated by the action of peptide N-glycosidase F (PNG- 
ase F) and collected. If performed in H2 !6O/H;2!80, this 
allows isotope-coded labeling at the point of glycosylation 
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(Figure 10.24). Released glycans can then be separated by 
normal or reversed phase HPLC (Section 2.4.3) followed 
by tandem MS (Figure 10.24). This generates a popula- 
tion of fragments with incremental mass differences cor- 
responding to recognizable units (e.g. glucose, mannose) 
ranging from the intact glycopeptide moiety down to the 
peptide core of the glycopeptide. Analysis of this popula- 
tion results in a plausible pattern of identity, sequence and 
branching. 

Hydroxyl groups of monosaccharide units can be con- 
veniently oxidized at cis-diols to form aldehydes (Figure 
10.23). Glycoproteins can be specifically captured with hy- 
drazine immobilized on beads resulting in a hydrazone link- 
age between the bead and protein. Tryptic digestion can 
now be performed and nonglycosylated peptides selectively 
washed away (Figure 10.24). Remaining glycosylated pep- 
tides can be isotope coded with succinic anhydride (Figure 
10.11) and enzymatically released from the beads by diges- 
tion with PNG-ase F. In principle, hydrazide capture will 
select both N- and O-glycosylated peptides. To remove the 
latter, it will be necessary to remove oligosaccharide units 
with a panel of exoglycosidases until a fuschin core is 
reached. This can then be removed by O-glycosidase. As 
not all glycoproteins contain this core, a chemical method 
such as -elimination would be better for general release of 
O-glycopeptides. 


10.4.3 Oxidation 


Oxidation is a naturally occurring process in most cells. It 
can arise from leakage of free radicals from the electron 
transport chain, as a consequence of the partially oxidiz- 
ing microenvironment of the endoplasmic reticulum or as 
a result of exposure to reactive oxygen species resulting 
in oxidative stress. Increased oxidation is a key factor in 
the ageing of cells and organisms as a result of decreasing 
efficiency of antioxidant processes. It is also a feature of 
much pathology including cancer, Alzheimer’s and cardio- 
vascular disease. Exogenous factors leading to production 
of reactive oxygen species include chemical pollutants such 
as metals. Oxidation has the potential to alter the structure 
of lipids (lipid peroxidation), DNA (e.g. formation of 8-oxo 
and hydroxylamine derivatives at bases causing genetic le- 
sions) and proteins. Some amino acid residues are espe- 
cially susceptible to oxidation and thus provide a ‘buffer’ 
to cells experiencing oxidative stress. We have previously 
seen that methionine can be reversibly oxidized to methio- 
nine sulphoxide (a marker of ageing in biology) or irre- 
versibly modified to methionine sulphone (Table 4.3). Simi- 
larly, cysteines (-SH) can be irreversibly oxidized to sulfonic 
(cysteic) acid (-SO3H) and reversibly oxidized to nitrosoth- 
iol (-SNO), sulphenic acid (-SOH) sulphinic acid (-SO,H) 
and thiyl radicals (S`). Other redox-based modifications 


at cysteines include cysteinylation and S-glutathionylation 
(Section 10.4.4 below). Some amino acid side-chains can be 
oxidized to aldehyde or ketone groups (collectively called 
carbonyls) and these modifications can target proteins for 
turnover. As well as being an indicator of cell ageing or 
pathology, oxidative modification of proteins is important 
in signal transduction in cells. Nitration of tyrosine to form 
3-nitrotyrosine can interfere with the activities of protein 
kinases. There is major interest in being able to identify 
components of the proteome which are targets for oxidative 
stress. 

Protein thiols can be quantitatively labelled with N-ethy] 
maleimide, iodoacetamide or iodoacetic acid. Advantage is 
taken of this in the design of ICAT probes by incorporat- 
ing an iodoacetoxy group as part of the label as described 
in Figure 10.8. Thiols involved in mixed disulfides with 
cysteine or glutathione are highly redox-sensitive and often 
form part of the cell’s response to oxidative stress (Sec- 
tion 10.4.4 below). An approach to their study is described 
in Figure 10.25. Free thiols are first labelled with iodoac- 
etamide/iodoacetic acid. This is followed by reduction with 
areducing agent (e.g. DTT) to reveal thiols involved in disul- 
fides followed by labelling with a tagged reagent. Examples 
of appropriate tags include the iodoacetoxy and maleimide 
ICAT probes shown in Figure 10.10 or iodoacetamide- 
fluorescein. These can be detected respectively in tandem 
MS systems or 2-D SDS PAGE (by western blotting with 
antifluorescein). 

Carbonylation of proteins has the effect of introducing 
chemically reactive aldehyde or ketone groups. These can 
be conveniently labeled with hydrazine attached to dinitro- 
phenol (DNP). Thus, carbonylated proteins can be identified 
in 1-D or 2-D SDS PAGE separations by western blotting 
(Section 5.11.2) with anti-DNP (Figure 10.26). This anti- 
body can also be used to ‘capture’ carbonylated proteins 
by immunoprecipitation followed by tryptic digestion and 
protein identification by tandem MS. 

Similar approaches can be taken to detection of nitro- 
sylated proteins. Tyrosines are irreversibly nitrosylated to 
3-nitrosotyrosines by peroxynitrite, a potent oxidant. An an- 
tibody to 3-nitrosotyrosine can be used to find proteins with 
this modification in 2-D SDS PAGE separations. This anti- 
body can also be used to immunoprecipitate labeled proteins 
followed by identification by tandem MS. S-nitrosylation at 
cysteines (SH is converted to SNO) is a result of production 
of NO and provides a reversible means by which this im- 
portant molecule can regulate activities of various proteins. 
Nitosothiols can be detected by a similar method to that 
described in Figure 10.25 except that nitrosothiols can be 
reduced by ascorbate (which does not reduce disulfides). 
An immunoblotting/immunoprecipitation approach is 
made possible by availability of specific antibodies to -SNO 
thiols. 
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Figure 10.25. Approaches to proteomic study of protein disulfides. Free thiols react readily with iodoacetoxy reagents to form SAH. 
Treatment with a reducing agent (e.g. DTT) reveals -SH groups previously involved in disulfides. These can be labelled for 2-D SDS PAGE 
(iodoacetamide-fluorescein) or for MS (ICAT) 


07 =0 =N-NH- DNP 
O= DNP-NH - N= 
=0 —> =N-NH- DNP 
O= DNP-NH - N= 
+ 
DNP-NH - NH, 2-D SDS PAGE 
FOLLOWED BY 
TRANSFER TO 
NITROCELLULOSE 
“s< 
Anti-DNP 


Figure 10.26. Detection of protein carbonyls. Proteins are treated with DNP-hydrazine. This reacts with carbonyls to form hydrazones. 
After electrophoretic separation in 1-D or 2-D SDS PAGE, carbonylated proteins can be detected with anti-DNP. 
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Figure 10.27. Enzymatic reducing systems. Protein disulfides can be reduced by thioredoxin and glutaredoxin while hydroperoxides are 
reduced by peroxiredoxins. (a) Thioredoxin has a conserved active site motif, reduced by thioredoxin reductase which is in turn reduced 
by reducing equivalents originating in NADPH + H+ from the pentose phosphate pathway. (b) Glutaredoxin is reduced by glutathione 
which is in turn reduced by glutathione reductase. (c) Peroxiredoxins are of three types. /-Cys contain a single cysteine which is oxidized to 


sulphenic acid by the hydroperoxide and reduced by an unknown reductant. Dimeric 2-Cys peroxiredoxins have two pairs of cysteines. The 
peroxidatic cysteine forms sulphenic acid which is reduced by the second (resolving) cysteine on the other subunit. In monomeric 2-Cys 


peroxireductases, the cysteine pair is on the same subunit. Peroxiredoxins are regenerated by the thioredoxin system as shown 


10.4.4 Protein Disulfides 


The -SH group of cysteine can be present in proteins as a 
free thiol (-SH) or associated in disulfide bridges (-S-S). 
This can occur naturally (e.g. ribonucleotide reductase is 
inactive in a disulfide bridged form but is activated when 
—S-S-— is reduced to the dithiol form SH HS) or can result 
from oxidative stress. When —SH is oxidized to sulfenic 
acid (-SOH), it can react with a nearby -SH or with the 
-SH of glutathione to form a disulfide. Cells possess several 
enzyme families, collectively grouped as the thiol-disulfide 
oxidoreductase superfamily, which reduce protein disulfides 
on specific protein targets (Figure 10.27). This may protect 


these —SHs against irreversible oxidation just long enough 
for the cell to recover from an oxidative event or it may be 
part of sampling possible disulfide bridge patterns involved 
in protein folding. These enzymes contain one or more disul- 
fide bridge(s) in their active site which is reduced to activate 
the protein. The pKa of the catalytic -SH groups are of- 
ten perturbed either far below or far above those of normal 
cysteines which confers extra reactivity upon them. Each 
protein family recognizes different substrates, is reduced 
in different ways and contains specific consensus motifs 
around the active site disulfide which makes them distinct. 
The thioredoxins reduce disulfides to dithiols using reducing 
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Figure 10.28. Diagonal gel electrophoresis. (a) Proteins with intrachain disulfides have more open conformation and hence electrophorese 


more slowly in SDS PAGE. Proteins lacking disulfides migrate identically in reducing/nonreducing conditions while proteins with interchain 
disulfides migrate as distinct subunits in reducing conditions. (b) Proteins are electrophoresed in nonreducing conditions. The track is excised 


and exposed to reducing conditions. Most proteins cluster on the diagonal while those with interchain disulfides migrate below it and those 


with intrachain disulfides migrate above it 


equivalents provided by FADH); glutaredoxins are reduced 
by glutathione while peroxiredoxins (thioredoxin-dependent 
peroxide reductases) are themselves substrates for thiore- 
doxins and directly reduce hydroperoxides in cells. The per- 
oxiredoxins are subdivided into two classes, /-Cys and 2- 
Cys, with the latter class further subdivided into monomers 
and dimers (Figure 10.27). Members of the thioredoxin su- 
perfamily protein disulfide isomerases catalyze formation of 
disulfide bonds in the endoplasmic reticulum of eukaryotes 
while another thioredoxin, DsbA, forms disulfide bridges 
on proteins translocated into the E. coli periplasm. In both 
cases, these enzymes help proteins to fold into their cor- 
rect disulfide bridge pattern before folding is complete. For 
this reason, the ER is a partially oxidizing compartment of 
eukaryote cells. Reducing equivalents for these processes 


are supplied ultimately by NADPH + H* from the pentose 
phosphate pathway. If the requirement for reducing equiv- 
alents exceeds the supply capacity of this pathway, then 
oxidative stress can result. 

The importance of disulfide bridges in protein folding 
has led to development of methods for identifying differ- 
ences in disulfide patterns within the proteome. One of the 
most popular is diagonal gel electrophoresis whereby sam- 
ples are exposed sequentially to nonreducing and reducing 
conditions during SDS PAGE (Figure 10.28). This is a con- 
venient method for distinguishing proteins which have in- 
terchain or intrachain disulfides in a single separation where 
nondisulfide bridged proteins cluster along a diagonal. Vari- 
ations in disulfide patterns resulting from oxidative stress 
can be revealed by comparing patterns. Proteins which are 
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Example 10.4 Use of diagonal gels to study protein disulfides 


Proteins may have intrachain disulfide bridges which are often an important means of stabilizing tertiary structure. 
Interchain disulfide bridges may attach separate polypeptides to each other. In cells exposed to oxidative stress the 
enzymes responsible for maintaining protein thiols in their reduced state can be overcome resulting in changes in the 
pattern of disulfides within the proteome. Diagonal gel electrophoresis provides a means of distinguishing these changes. 
The method involves performing one dimensional electrophoresis under nonreducing conditions. A slice of gel containing 
the separated proteins is then excised and exposed to reducing conditions (e.g. DTT) and electrophoresed 90° to the 
original electrophoresis direction. The majority of proteins in the proteome lack disulfide bridges and therefore migrate 
along the diagonal. Proteins containing intrachain disulfides have a more open structure and migrate more slowly in 
reducing conditions. These therefore appear as spots above the diagonal. Proteins with interchain disulfides migrate as 


Nonreducing Direction 


$101-S104. Copyright © 2006, Elsevier. 


substrates of members of the thioredoxin superfamily can 
also be routinely identified in this analysis (Example 10.4). 


10.4.5 The Phosphoproteome 


Phosphorylation is one of the most important post- 
translational modifications of proteins. The reaction is cat- 
alyzed by protein kinases which make up almost 2% of the 
genes encoded in the human genome. Protein phosphatases 
dephosphorylate proteins and many proteins are only ac- 
tive in their phosphorylated form making phosphate a con- 
venient chemical ‘on-off’ switch. Phosphorylated protein 
variants are a feature of signal transduction cascades in the 
cell. Phosphate groups are conjugated to serine, threonine or 
tyrosine side-chains and some 30% of proteins are capable 
of being phosphorylated. The phosphoproteome is the com- 
plement of proteins existing in phosphorylated form under 


separate polypeptides under reducing conditions and thus appear as spots below the diagonal. 


Reproduced from McDonagh, B., Tyther, R. and Sheehan, D. (2006) Redox proteomics in the mussel Mytilus edulis. Marine Environmental Research, 62, 
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a given set of conditions. Because of the central importance 
of phosphorylation in signal transduction, it represents a key 
target for study. 

A tiny proportion of proteins are phosphorylated at any 
given time. For 2-D SDS PAGE, phosphoproteins can be 
directly labeled with °P and identified using autoradiagra- 
phy of separations. However, this approach does not lend 
itself to high-throughput analyses. It is often necessary to 
enrich for phosphoproteins in order to arrive at quantita- 
tive data. Antibodies to phosphate groups and metal affinity 
columns can be used for this purpose but frequently prepa- 
rations are contaminated with nonphosphorylated proteins. 
It is also possible to replace the phosphate group with bi- 
otin affinity tags allowing for enrichment of phosphopep- 
tides/ proteins on avidin (Figure 10.29). This takes advan- 
tage of the fact that phosphates are labile at alkaline pH 
and can undergo -elimination forming a reactive dehydro- 
analyl residue. This can act as an acceptor of a sulphydryl 
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Figure 10.29. Enrichment of phosphopeptides by Michael addition. Protein thiols are first oxidized to cysteic acid with performic acid 
and the oxidized protein digested with trypsin. Phosphate is eliminated from peptides containing phosphoserine under alkaline conditions. 
Ethanedithiol adds to the «—-unsaturated bond in a Michael addition labelling the site of phosphoserine with —SH. This reacts with 
biotin- NEM. On tandem MS, a fragment characteristic of the biotin label is generated only in phosphopeptides. The Michael addition also 


occurs with phosphothreonyl] peptides albeit at a slower rate 
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Figure 10.30. ICAT identification of phosphopeptides by Michael addition. A peptide containing phosphoserine (X = H) or phosphothre- 
onine (X = CH3) is pretreated to block -SH groups (by alkylation). Phosphate is then removed by fß-elimination (see Figure 10.27). The 
&x— unsaturated bond is susceptible to a Michael-type addition of 1,2-ethanedithiol which is available in protonated or deuterated forms 
resulting in a 4 Da mass difference between matched peptides on LC/MS. Tandem MS identifies site of modification in specific peptides 
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Figure 10.31. Identification of phosphopeptides by solid phase capture. Phosphopeptides in a tryptic digest are blocked with tBOC groups 
followed by condensation of phosphates/carboxyls to ethanolamine. Trifluoroacetic acid (TFA) regenerates phosphates selectively and these 
are reacted with cystamine introducing -SH on phosphates. Phosphopeptides are captured by iodoacetic acid coated glass beads. TFA 


releases and deblocks peptides which are analysed by LC-Tandem MS 


reagent in a Michael-type addition. Provided the phospho- 
peptides originated from oxidized proteins, only phospho- 
peptides will now carry an -SH and these can be labeled with 
NEM-biotin and collected on avidin. Tandem MS generates 
a characteristic fragment from this label. 

Methodologies resembling the ICAT approach (Section 
10.2.2; Figures 10.10 and 10.11) have been developed for 
enrichment and MS investigation of phosphopeptides. The 
most popular methods preoxidize -SH groups with later in- 
troduction of new —SH groups at sites of phosphorylation. 
These are amenable to modification with -SH ICAT probes. 
Solid phase capture ensures efficient washing steps and en- 
richment of phosphopeptides. 8-Elimination, followed by 
Michael addition, adds 1, 2-ethanedithiol at the site of phos- 
phorylation (Figure 10.30). This dithiol is available in both 
protonated and deuterated forms thus facilitating isotope 
coding. The -SH attached at the phosphorylation site can 
react with iodoacetetic acid attached to biotin and labeled 
peptides can be captured on avidin for LC-tandem MS anal- 
ysis. The biotin label shows up in the tandem MS as a 
characteristic fragment denoting phosphorylation. A slightly 
different approach condenses ethanolamine to both phos- 
phate and carboxyl groups with selective regeneration of 
the phosphates (Figure 10.31). These then react with cys- 
tamine which introduces —SH at the phosphorylation site. 
Labeled peptides can be captured on glass beads coated with 
iodoacetic acid and peptides eluted on treatment with TFA. 
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Appendix 1 
SI Units 


Table 1.1. Base and derived units of the SI system Table 1.3. Some important physical constants 

Quantity Unit name Symbol Atomic mass unit (AMU) 1.661 x 10-4 g 
Avogadro’s number (N) 6.022 x 1073/mol 

a BASE UNITS Boltzmann constant (k) 1.381 x 10% J/K 

ass Gram 8 Universal Gas Constant (R) 8.315 J/mol.K 

Length Metre m ; 34 

Time Second j Planck’s constant (h) 6.626 x 107 J.s 

Current Ampere A Velocity of light (c) 2.998 x 10! cm/s 

Temperature Kelvin K Angstrom (A) 107! m 

Amount Mole mol Electron volt (eV) 1.602 x 107” J 

Luminous intensity Candela cd in 

DERIVED UNITS 

Volume Litre L 

Energy Joule J i 

Voltage Volts v Table 1.4. Some useful conversion factors 

Resistance Ohms Q Temperature k=°C +273 

Force Newton N Energy 1J = 0.239 calories = 10’ erg 

Pressure Torr t Pressure 1t= 1.333 x 10? Pascal = 1.32 x 107° 

Frequency Hertz Hz atmosphere 

Radioactivity Curie Ci 


Radioactivity 1 Ci = 3.7 x 10!° dps = 37 Bequerel 


Table 1.2. Prefixes commonly used with the SI system 


10° giga G 
106 mega M 
10° kilo k 
107! deci d 
107? centi c 
10-3 milli m 
1076 micro u 
107° nano n 
10-2 pico P 
10-5 femto f 
10718 atto a 
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Appendix 2 


The Fourier Transform 


Sometimes in science we need to relate two sets of numbers. 
A simple example is the relationship between the side of a 
square and the area of a square as follows: 


Side of square (m) Area of square (m°) 


ONNr 
N 
Nn 


Note that there is a unique relationship between the side 
and area numbers and that the value for area can be calcu- 
lated from that of the side and vice versa. Note also that the 
units change when the side of square data is transformed 
into area data and that the two data-sets could be related 
by a mathematical equation which also relates their units 
(e.g. side? = area). This is a general example of a trans- 
form. Such mathematical operations can be useful in relat- 
ing some quantity which we can experimentally measure to 
some other property of the system under investigation which 
we wish to quantify. For example, using the simple equation 
above we can calculate the area of any square from its length 
without needing to measure the area directly. 

Joseph Fourier (1768—1830) was a French mathematician 
and physicist who, because of his interest in heat conduc- 
tion, developed a mathematical method for the representa- 
tion of any discontinuous function in space or time in terms 
of a much simpler trigonometric series of continuous cosine 
or sine functions. Such a series is referred to as a Fourier 
series and the process of dissection into cosine and/or sine 
components is called Fourier analysis. The opposite or 
inverse operation, recombining cosine and/or sine functions, 
is called Fourier synthesis. 

Figure A2.1 shows an example of a problem studied by 
Fourier. If part of a metal bar is heated, the temperature of 
the bar is highest at the point where it is heated and this heat 
gradually spreads through the whole bar (Figure A2.1(a)). If 
we plot the temperature as a function of distance along the 
bar we see that this is an example of a discontinuous function 
(b). Fourier realized that, if the temperature was decomposed 


into continuous sinusoidal curves having 1, 2, 3 or more 
cycles along the bar (c), then summation of these would 
yield a good approximation of the original discontinuous 
function (dashed line in (d)). The more curves added into 
the Fourier series, the better the agreement of the Fourier 
synthesis with the experimental measurement. 

In Chapter 3 (Figure 3.1) we saw that any wave function 
can be defined in terms of amplitude, A and wavelength, 
A, or frequency, v. In Chapter 6 we further saw that a full 
description of such a wave also requires definition of a phase, 
ġ (Figure 6.46). A simple one-dimensional wave function, 
F(x), is shown in Figure A2.2. f(x) specifies the height of the 
wave at any horizontal point x where x is defined in terms of 
wavelength such that x = 1 = à. This wave function could be 
represented as either a sine or a cosine function as follows: 


f(x) = A cos 2m(vx + ) 
or 
f(x) =A sin2a(vx + ) 


The term 27 appears because there are 27 radians per wave- 
length. Fourier analysis of complex wave functions disso- 
ciates them into a Fourier series of simple wave functions 
such as f(x). We could write a Fourier series containing n 
terms as 


F(x) = Ao cos 2x (0x + do) + Ai cos 2r (x + Qı) 


+A 2 cos 27 (2x + h2) - -- + An cos 2r (x + Qn) 


= om Ay cos 27 (vx + dy) 


v=0 


This shows that any complex wave can be expressed as a 
composite of simpler cosine or sine waves such as those in 
Figure A2.1. 

If a complex discontinuous function is periodic (although 
this is not essential) with a period T such that f(t + T) =f (t), 
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Figure A2.1. Heat Conductance in a metal bar. (a) A bar is heated at a point denoted by arrow, (b) Temperature distribution along the bar. 
(c) Temperature may be represented as a Fourier series of continuous wave functions along the bar. This dissociation into individual waves 
is called Fourier analysis, (d) Summation of these continuous functions (Fourier synthesis) gives a good representation of the discontinuous 
function. The more waves added together (i.e. the more terms in the Fourier series), the closer the fit to the data shown in (b). 


it may be represented as a Fourier series of the form 


F) = ao/2 + ay cos 2a (t/T) + az cos 27 (21/ T) 
+--+ bisin2r(t/T) 


loo) CO 
= > + Xa, cos 2m (nt/T) + >. sin27(nt/T) 


n=1 n=1 


loo) 


5 cne” ®"/T) 


n=—00 


where t is time, n is the number of components in the series, 
iis the imaginary number (i? = —1)'? and a, b, c and so on 
are coefficients describing the individual waves in the series. 

This mathematically expresses the fact that a discontin- 
uous function can be dissected into individual sine/cosine 


Figure A2.2. A one-dimensional wave. The wave can be uniquely 
defined in terms of phase (@) amplitude (A), wavelength (A) or 
frequency (v). 


wave functions which may in turn be summed to arrive at 
an expression for it as a function (in this case) of time. The 
relationship between the two sets of functions can be gener- 
alized with a pair of equations which are Fourier transforms 
of each other: 


fos f F(y)e™'dy 
F(y) = f 7 f(xje "dx 


where i? = —1. 

A good practical analogy for Fourier analysis is the 
diffraction of white light from the sun after passage through 
a prism into waves of individual frequency and amplitude 
which we described in Chapter 3 (see Figure 3.2). The 
strength of sunlight incident on the prism may be expected 
to vary with time which, in turn leads to variation in the 
amplitude of each diffracted frequency. The prism therefore 
acts to transform a strength-time domain into an amplitude- 
frequency domain. 

Many of the techniques described in this book result in 
complicated discontinuous functions which, like white light 
or the temperature of a heated metal bar, may be regarded 
as composites of simpler continuous wave functions. The 
Fourier transform is special because the equation relating 
sets of numbers which can be interconverted by the Fourier 
transform contains sine and cosine terms allowing us to 


relate the complex wave to a series of simpler waves in 
a fixed and meaningful way. This allows us to determine 
unknown discontinuous experimental functions such as the 
electron density of an atom (a function in space) or the 
electromagnetic spectrum of a molecule (which can be rep- 
resented as the time-inverse function, frequency). 

In X-ray diffraction experiments the electron density in 
one dimension may be represented as a complex, discon- 
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tinuous wave function in space (Figure A2.3; see Chap- 
ter 6; Figure 6.55). This is composed of many simpler 
electron density waves which could combine together in an 
infinite variety of ways depending on their relative phases. 
However, if the relative phases are known, these individual 
electron density waves represent a Fourier series and can be 
combined by Fourier synthesis to generate a unique overall 
one-dimensional electron density wave. 


Intensity 


te 


Electron density waves 


Time 


Free induction decay (FID) 


pp 


Retardation 


Interferogram 


Figure A2.3. Practical applications of the Fourier transform. The Fourier transform allows analysis of discontinuous functions in space or 
time in terms of continuous wave functions. Examples include the following: (a) Calculation of electron density maps. A one-dimensional 
section (dashed line) through a C=O group is shown. Summation of electron density waves of known phase (right) allows calculation of 
a one-dimensional electron density map (left), (b) Multi-dimensional NMR. The free induction decay (right) is the Fourier transform of 
the NMR spectrum (left), (c) Fourier transform infrared spectroscopy. The interferogram (right) is the Fourier transform of the infrared 
spectrum (left). Note the change in units commonly found in Fourier transform. 
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If the situation shown in Figure 6.55 is extended to three 
dimensions (xyz), we can use summation of structure fac- 
tors and relative phase angles to calculate electron density 
in three dimensions. The Fourier transform therefore pro- 
vides a means to move back and forth between these two 
sets of numbers, one representing reciprocal space where 
we make our experimental measurements (i.e. intensities of 
Bragg reflections = | Fj,;|? and relative phase angles, aj, 
determined as described in Sections 6.5.1 and 6.5.3-6.5.6, 
respectively) while the other represents real space (i.e. the 
electron density map). Thus the electron density, (xyz) is 
the Fourier transform of the array of structure factors and 
the array of structure factors is the inverse Fourier transform 
of the electron density with both the structure factors and 
the electron density representing Fourier series of simple 
waves: 


Structure factor: 
Pau = f p(xyzje'? dV 
v 
Electron density: 


1 f 
xyz) = — Falet dV 
p(xyz) yd! Akt 


where (@ = 27) (hx + ky + lz). 

In NMR (Figure A2.3(b); see Chapter 6; Section 6.2.1) the 
result of the experiment is also a complex wave called a free 
induction decay which is represented as a plot of intensity 
as a function of time. Fourier analysis of this complex wave 
allows transformation into a conventional NMR spectrum 
of intensity as a function of frequency. That they are related 
by the Fourier transform is clear from the following: 


Time domain: 
29 . 
f= it F(s) e" ds 
=O 


Frequency domain: 


F(s) = I - fA e dt 


Fourier transform infrared spectroscopy (Figure A2.3(c); 
see Chapter 3; Figure 3.43) produces a resultant wave called 
an interferogram which represents an interference pattern 
between simpler waves partially out of phase with each 
other. Many of these are generated during a single experi- 
ment and their pattern, which is plotted as a function of re- 
tardation, is characteristic for the infrared absorbance spec- 
trum of each sample molecule. Fourier analysis simplifies 
the calculations necessary to analyse these interferograms 
by dissecting out the simpler waves from each other and 
by transforming the retardation domain into a frequency 
domain which can be plotted as a conventional infrared 
spectrum. In both these spectroscopic techniques there is a 
fixed relationship between the complex wave function on 
the one hand and the simplified spectrum on the other which 
facilitates their interconversion by the Fourier transform. 

The reason the Fourier transform is so widely used is 
that it offers specific computational advantages over other 
mathematical approaches. However complex the Fourier se- 
ries, it is related to the original function by a comparatively 
simple equation making it possible to move from the real- 
world domain of space or time to the Fourier domain of fre- 
quency, phase and amplitude. Moreover, considerable error 
can accumulate due to shortcomings in real-world experi- 
mental systems such as those used in diffraction or spec- 
troscopy. Development of the fast Fourier transform algo- 
rithm has facilitated very rapid calculation by computer in 
the Fourier domain which can smooth out and diminish this 
error, resulting in a more accurate overall measurement. 
The Fourier transform therefore provides a computationally 
versatile tool to analyse complex functions arising from ex- 
perimental measurements by dissecting them into simpler 
wave functions which can be used to determine experimen- 
tal unknowns. 
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Absorption 
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immobilization, 31—5 
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in affinity chromatography, 
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in immunoelectrophoresis, 
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equipment, 177 
Agre, Peter, 307 
Alanine, chirality of, 79 
Albumin, bovine serum, 63 
ADP, phosphorylation of, 97 
Affinity gradient chromatography, 
33, 35 
Alanine, 79, 
Aldehyde, 370, 372 
Aldolase, 27 
Algorithm, 320 
Alkaline phosphatase, 272 
Allosterism, 85, 206, 211-2 
Aluminized mylar, 120 
Alzheimer’s Disease, 338, 374 
Amide bands, 87 
Amido black, 151 
Amino acids 
as spin systems, 216-7 
blocked 
with F-moc, 133-4 
chemical activation, by 
pentafluoropheny] ester, 134 
chirality of, 78-9 
deblocking, 133-4 
fragmentation in MS, 126 
glycosylation, 131-2 
isobaric, 126, 133 
phosphorylation, 131-2 
3-Aminonaphthalhydrazine, 67 
Aminopeptidases, 130-1 


Aminoproteinases, 131 
Ammonia, 118 
Ammonium bicarbonate, 286 
Ammonium sulphate precipitation, 
31 
AMP, absorbance spectrum of, 60 
Ampholytes, 169-70, 351 
Amplitude, 54 
of Bragg reflections, 340-1 
Ampholyte, 167, 169, 190 
Amphoteric molecules, 3 
AMU (atomic mass unit), 113, 118, 
381 
Amyloglucosidase, 170 
Analyzer, 123 
double-focusing magnetic sector, 
123-4 
electrostatic, 123-4 
ion cyclotron resonance fourier 
transform, 124 
ion trap, MS, 125 
quadropole mass, 123-4 
time-of-flight (TOF), 124, 125 
Anfinsen, 
cage, 208-9, 212 
Christian, 199 
Angle of incidence, 105, 236, 238 
Angular frequency of rotation (w), 
124 
Angular velocity of rotation (w), 
266 
1-Anilino-8-naphthalene sulphonate 
(ANS), 67, 72-3, 150-1, 200 
Anion exchange, 22-5, 172 
Anthracene-9-carbonyl-N'!- 
spermine, 83 
Anion exchangers 
in chromatography, 23-5, 49 
in chromatofocusing, 170, 172-3 
Anomalous diffraction, see X-ray, 
anomalous scattering 
Anti-Stokes lines, 91 
Antibody, 31, 172, 288, 362-3, 376 
complexes, 174 


Physical Biochemistry: Principles and Applications, Second Edition David Sheehan 


© 2009 John Wiley & Sons, Ltd 


388 INDEX 


Antibody (Cont.) 
constant region, 173-4 
in fluorescence staining, 75-6 
monoclonal, 172 
polyclonal, 172 
titre, 174 
variable region, 173 
Antigen, 31, 75-6, 172, 288, 362-3 
Apical domain, of GroEL, 207 
Apoptosis, 289 
TUNEL (terminal 
deoxynucleotidyl mediated 
dUTP nick end-labelling) 
assay of, 289 
Aquaporin, 255, 305, 310, 319, 
322, 329, 331-2, 337-9, 
343-4 
Arc repressor, 221-3 
Archimides’ principle, 266 
Archive, 308 
Area detector, 234, 236-7 
Argand diagram, 240 
Argon, 123, 127 
Arrhenius 
equation, 5, 293-4 
plot, 294 
Arrays 
DNA, 359, 362-3 
protein, 363 
Assay 
band shift, 166, 168 
Bradford, 8, 63 
creatine phosphokinase, 97 
dehydrogenase, 63 
ELISA, 76, 108 
ß-galactosidase, 65 
Luciferase, 66-7 
Peroxidase, 65—7 
Phosphatase, 65 
Sphingomyelinase, 65 
TUNEL (terminal deoxynucleotidyl 
mediated dUTP nick 
end-labelling), see Apoptosis 
Association constant, 71, 109 
Asymmetric unit, 227, 241, 246 
Asymmetry, centre of, 78—9 
Atomic coordinates, 330 
Atomic force microscopy (AFM), 
103 
Atomic mass unit (AMU), 113, 
381 
Atmospheric pressure ionization 
(APD, 121-2 


Atmospheric pressure chemical 
ionization mass spectrometry 
(APCI/MS), see Mass 
spectrometry 

ATP-ase, 206, 207, 342 

AutoDock, 346 

Automated modelling, 342, 344-5 

Autoradiography, 151, 163, 166, 376 

Autosampler, 13, 20, 40, 188 

Avidin, 31, 376, 378 

Avidity, 173 

Avogadro’s number, 71, 268 

Axial ratio, 84, 178, 266 


20Ba, 120 
Back pressure, 40, 45 
Bacteria 
genomes, 309 
mesophilic, 5 
taxonomy, 182 
thermophilic, 5, 200 
Bacterial artificial chromosome 
(BAC), 314-5 
Bacteriophage, see Phage 
Bacteriorhodopsin, 101, 254 
Band 
-broadening 
in chromatography, 15, 17-8 
in electron spin resonance 
(ESR), 99 
shift assay, see also SSCP 
analysis, 166-7, 247 
Barbituric acid, 7 
Base pairing, Watson-Crick, 77, 192 
B-DNA, 247 
Beam 
stop, 234, 236 
splitter, 62, 87-8 
Beer-Lambert law, 58, 80 
Biased reptation model, of DNA 
electrophoresis, 178, 182 
Bifunctional reagent, 159 
Bilayer, phospholipid, 99 
Bilirubin, 74, 83 
Binding, constant, 297 
isotherm, 297 
Binomial expansion, 95 
Biocalorimetry, 293-303 
Bioinformatics, 305-347 
tools, 307 
Biological membranes, see 
Membranes 
BioMagRes database, 330 


Biomolecules 
chemical attributes of, 2 
physical attributes of, 2 
Biot, Jean-Baptiste, 78 
Biotin, 77, 357, 364, 367, 368, 
376-8 
BLAST, 306, 308, 317-9, 344 
Blank, spectrum, 62 
Blotting, see also Electroblotting, 
149 
dot, 172-4 
northern, 191, 194-5, 316 
Southern, 191, 192-4 
western, 172, 184, 189-92 
Blue native gel electrophoresis, 
354-5 
Bohr 
magneton, 98 
Neils, 55 
Boltzmann 
constant, 72, 279 
distribution, 279, 294 
Bond 
angle, 58, 217, 218, 252 
energy, 4 
length, 58, 252, 254 
Bonded phases, 12 
Boundary, 274 
spreading, 274-5 
Bovine pancreatic trypsin inhibitor, 
169, 204 
Bovine serum albumin (BSA), 174, 
193 
Amax, 58 
Bovine spongiform encephalopathy 
(BSE), see Spongiform 
encephalopathies, 
transmissible (TSE), 
Bradford assay, 8, 63 
Bragg 
law of diffraction, 236-8 
reflections, 238—40 
5-Bromo-4-chloro-3-indolyl-b- 
galactopyranoside (X-Gal), 
65 
Bromophenol blue, 63, 
Brønsted acids and bases, 3 
BSE, see Spongiform 
encephalopathies, 
transmissible (TSE), 
Buffer, 3, 6-7, 279, 281-6 
additional components, 8 
altering conditions, 24, 27 


amphoteric, 172 
concentration, 7 
Good’s, 7 
in chromatography, 20-1 
running (in electrophoresis), 152, 
179 
wash, 33 
Buffering capacity (£), 6-7 


2c 
AMU, 118 
% abundance, 118 
scattering factor, 236 
BC 
AMU, 118 
% abundance, 118 
in NMR, 212, 220 
C-4, 21 
C-8, 21 
C-18, 12, 21, 28-30, 133, 141, 
190 
Caesium, 123 
Chloride (CsCl,), 272 
gun, 123 
Californium (Cf), 120 
Calorimeter, 292 
isobaric, 294 


Calorimetry, see also Biocalorimetry 


and Protein folding 
applications of, 297-300, 301-2 
differential scanning (DSC) 
down-scans in, 300 
up-scans in, 300 
Cambridge Crystallographic Data 
Centre, 306, 329 
Cantilever, 103—4 
Capacity factor, in chromatography, 
19 
Capillary 
array electrophoresis, 188 
column, 128 
electrochromatography, 190 
electrophoresis (CE), 129, 149, 
183-90, 355 
applications, 179 
detectors, 188 
equipment, 188, 191 
formats in, 188, 190 
free solution, 179, 183, 186, 
188 
flow rates, 188 
interfaced with other methods, 
190 


isotachophoresis (CITP), 129, 
187-8 
loading volumes in, 183, 188 
physical basis, 183-8 
sensitivity, 183, 188 
glass, 183 
Carbon tetrachloride, 43 
Carbonic anhydrase, 169 
Carbonylation, 372 
Carboxymethyl (CM) group, 21, 
24-5 
Carboxypeptidases, 130, 132 
Caterpillar model, of DNA 
electrophoresis, 182 
CATH database, 307, 331, 333 
Cathodic drift, 349 
Cation exchangers, in 
chromatography, 23-5, 49, 
282 
cDNA, 362-3 
Celera, 309, 315 
Cell 
collision, 127 
cycle, 288, 352 
physical attributes, 288 
populations, 286 
sorting, 286 
transformation, 367 
Centrifugal 
concentrators, 283 
force (F), 266, 270 
relative (RCF), 270-71 
Centrifugation, 266, 268-79 
analytical, ultra-, 271-2, 274 
applications, 272-9 
differential, 268 
equipment, 269 
isopycnic, 273 
physical basis, 266, 268-9 
zonal, 273 
Centrifuge, 269-70 
benchtop, 271 
minifuge, 271 
ultra-, analytical, 271 
Celsius scale, 8 
Chain 
entropy in proteins, 201 
termination, see DNA sequencing 
Chaotropic agents, 25, 33, 160, 
302 
Chaperones, 206-9, 211-2 
Chaperonins, 206-9 
Charge, 113 
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-coupled device camera (CCDC), 
107, 236 
distribution, 5 
isomerization, 130-1 
Chemical 
ionization mass spectrometry 
(CI/MS), see Mass 
spectrometry 
shielding, 93 
shift, 93-4, 215 
Chemiluminescence, 66-8, 89 
Chime plug-in, 307, 334 
Chimera, 36 
Chip 
DNA, see Microarrays, 
protein, 359, 362-3 
technologies in proteomics, 359, 
362-6 
Chiral 
centres, see also Centres of 
asymmetry, 252 
mixtures, 78 
molecules, 78-9 
separation, 183, 190 
Chirality, 78-9, 227 
of amino acids, 79 
Chloroform, 43 
Chlorophyll, 288 
2-Chloropropane, 43 
Chromatin, 84, 288 
Chromatofocusing, 170, 172-3 
Chromatography, 11-50 
activated resins, 31, 33-4 
adsorption, 12 
affinity, 31-7 
buffers, 20 
C-18 reversed phase, 12 
desalting, 27 
effect of 
flow rate, 18-9, 44-5 
pressure, 44-5 
elution, 21 
equipment used, 20-2 
gas (GC), 13-4 
GC/MS, 128-9 
gel filtration, 25-8 
gradients, 21 
high resolution, 40-7 
hydrophobic interaction, 199 
hydroxyapatite, 37, 39 
ion exchange, 11-2, 170, 172 
isocratic, 21 
lectin affinity, 355 
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Chromatography (Cont.) 
liquid (LC), 12-3 
liquid-liquid, 12 
membrane-based, 45-7 
modes, 22-37 
normal phase, 29 
open column, 37-40 
partition, 12 
peak broadening in, 15 
performance parameters, 14—20 
perfusion, 44-5 
principle, 11-2 
radial flow, 39 
reversed phase, 28-30, 282 
size-exclusion, see Gel filtration 
system, 13 
thiol exchange, 355 
Chromogenic substrates, 64-5, 192 
Chromophore, 58, 288 
extrinsic, 64 
intrinsic, 64 
Chromosome, 179, 288 
size, 179-90, 182 
Chymotrypsin inhibitor, 203 
Chymotrypsinogen A, 27 
Circular dichroism (CD), 79-83, 
223 
equipment used, 80-1 
in structural studies, 81-3 
in binding studies, 83 
of nucleic acids, 83 
spectra, 80, 82 
Circularly polarized light, see also 
Dichroism, 78—9 
CJD, see Spongiform 
encephalopathies, 
transmissible (TSE), 
35¢C] 
AMU, 117 
% abundance, 117 
37¢C] 
AMU, 117 
% abundance, 117 
Cloning, positional, 77 
Clustal W, 307-8, 318, 321 
CM (carboxymethyl), 21, 24-5 
Cochaperones, 206 
Co-crystallization, 230, 260, 331 
Codon 
start, 310 
stop, 310 
termination, 310 
usage, 310 


Coefficient 
Diffusion, 71-2, 107 
mass transfer, 109 
Coenzymes, 31, 48 
Coherent 
light, 100, 107 
scattering, 236 
Collimator, 253 
Collision 
cell, 127 
-induced dissociation (CID), 127 
theory of chemical reaction, 
293-4 
Colloids, random, 200 
Column chromatography, 11 
Competent cells, 196 
Complement 
pathway, 280 
serine protease (CI), 280 
Conductance detection in CE, 188 
Constructive interference, 87-8, 
100, 237, 238 
Constraints 
in structure refinement, see also 
Distance constraints, 
Contig (contiguous clone map), 315 
Convection, 45 
intraparticle, 45 
Coomassie brilliant blue, 63, 151, 
174, 195-6 
Core stream, 286 
Correlation effect 
heteronuclear, 221 
spectroscopy (COSY), 214, 
216-8, 219 
Couette 
flow cell, 84 
viscometer, 265 
Coulombic explosion, 122 
Counterion, 25, 28, 
Coupling constant, 95, 215, 216 
Creatine, 97 
phosphokinase, 27, 97 
Creuzfeld-Jakob Disease (CJD), see 
Spongiform 
encephalopathies, 
transmissible (TSE), 
Cross-flow filtration, 47 
Crossed-field gel electrophoresis, 
see Rotating-field gel 
electrophoresis 
Cryo-electron microscopy (cry-EM), 
254 


cryoprotectant, 235 
Crystal, 225-9 
alignment, 239 
asymmetric unit, 227 
cubic, 228-9 
effect of temperature, 231 
flash freezing, 235 
indexing, 239 
inorganic, 240 
lattice, 227-9 
constants, 228-9 
mechanical stability of, 233 
mosaicity, 228 
mounting, 233-6 
space group, 227-9 
static disorder, 256 
symmetry, 226-8 
systems, 227-8 
twinning, 231 
two-dimensional, 254 
Crystallization 
by desolvation, 232 
by evaporation, 232 
by free interface diffusion, 233 
by hanging drop, 232 
by microdialysis, 232 
by reducing protein solubility, 
231, 232 
by sitting drop, 233 
by vapour phase diffusion, 232-3 
conditions, 231 
methods, 231-4 
physical basis, 228-31 
solution, 233 
in space (microgravity), 231, 233 
Cuvette, 62, 68 
Cyclohexane, 43 
Cyclotron, 124 
Cystamine, 378-9 
Cysteic acid, 138 
Cysteine 
alkylation of, 138 
counting of, 138 
ICAT labelling, 258 
oxidation of, 135, 372 
proteases, 366 
reduction of, 135, 138 
spin labelling of, 99, 101-2, 217 
Cysteinylation, 372 
Cytidine deaminase (CDA), 142 
Cytochrome C 
absorption spectrum of, 60 
pl of, 169 


Cytochrome P-450, 272 
Cytokines, 302 
Cytoskeleton, 342 
Cytosine, 245 


A>-3-Ketosteroid isomerase, 220 
DAPI (4’,6’-Diamidino-2- 
phenylindole), 
85 
Dark complex, 70 
Databases, 9, 305-7 
CRYSTME, 329 
expressed sequence tag (EST), 
315, 316 
genome, 306, 309-10, 314-5 
Inorganic Crystal Structure, 329 
Macromolecular Structural 
Database (MSD), see PDB 
mining, 316 
protein (PDB), 257-9, 305, 
329-31, 334 
file organization, 331-2 
searching, 331 
Research Collaboratory for 
Structural Biology, 305 
sequence, 118, 257, 305 
single nucleotide nucleotide 
polymorphism (SNP), 315 
specialist structural, 331, 333-4 
structural, 305 
tertiary structure, 327-34 
tools, 307 
two-dimensional electrophoresis, 
352 
UniProt, 306, 312-3 
Daughter ions, 127 
Dayhoff, Margaret, 305 
DEAE (diethylaminoethy]), 21, 
24-5 
Deamidation, 130-1 
of asparagine, 131 
of glutamine, 131 
Deblocking, 134 
incomplete, 133 
De Broglie, Louis, 55 
De Broglie’s relationship, 55 
Dehydroanaly] derivative, 376-7 
Dehydrogenase 
activity stain, 155-6 
assay, 63 
DeLano, Warren, 334 
Denaturation 
of proteins, 4, 292 


Dendrogram, 180 
Densitometer, 151, 236 
micro-, 254 
Density gradient, 241 
centrifugation, 273-4 
Derivative spectra, 98 
Derivatization, 113 
Desalting, 125 
by centrifugal concentrators, 
282-3 
by dialysis, 24, 281-2 
by gel filtration, 27 
by ultrafiltration, 281 
Desorption, selective, 119 
Destabilising forces in proteins, 
202-3 
Destructive interference, 87-8, 237 
Detergent, 7—8, 255 
action, 155 
Detoxification enzymes, 30 
Deuterium, 357 
Dexrorotatory, 79 
Dextran, 26 
Diagonal gel electrophoresis, 375-6 
Dialysis, 24, 232, 282-4 
4’,6’-Diamidino-2—phenylindole 
(DAPI), 85 
Diastereoisomers, 78—9 
Dichroism 
circular (CD), 79-83, 300 
equipment used, 80-1 
in structural studies, 81-3 
in binding studies, 83 
of nucleic acids, 83 
spectra, 82 
induced, 83 
linear (LD), 83-4, 265 
Dicumarol, 83 
Dicyclohexylcarbodiimide, 378 
in immobilization of affinity 
ligands, 34 
Dielectric constant, 285 
Diethyl pyrocarbonate, 285 
Diethylamino ethyl (DEAE), 21, 
24-5 
Difference 
gel electrophoresis (DIGE), 352, 
354 
spectroscopy, 61, 86, 89 
spectrum, 61 
Differential scanning calorimetry 
(DSC), 82, 297, 300-2 
applications of, 301-2 
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Diffraction, 54 
grating, 61, 86 
of electrons, 254 
of light, 54 
of neutrons, 254 
of X-rays, see also X-ray, 225-6, 
235-55 
patterns, 237, 240, 245 
Diffusion 
coefficient, 268, 275 
and fluorescence, 71-2 
in CE, 183 
in SPR, 107 
eddy, 15-6 
of electrons, 122-3 
in chromatography, 15 
intraparticle, 45 
longitudinal, 19 
rate, 107 
Diffusive 
flux (Jp), 275-6 
pores, 45 
Digitonin, 8 
Digoxigenin, 77 
Dihedral angles, 215, 252, 342 
2,5-Dihydroxybenzoic acid, 120 
Dimethyl suberimidate, 160 
Dimethylsulphoxide (DMSO), 
43 
Dinitrophenol (DNP), 65, 372-3 
Diode array, 62 
1, 4-Dioxan, 43, 231 
Disaggregated suspensions, 287 
Dispersive infrared 
spectrophotometer, 86 
Displacement parameter, 252 
Displacers, in chromatography, 22, 
29 
Dissociation constant (K p), see also 
Equilibrium and Ligand 
binding constants, 72, 297 
Distance constraints, 214-5, 
218-9 
Distance geometry, 219 
Distance matrix, 336 
Disulphide bridge, 201-2 
formation, 130, 132, 135, 302 
identification, 133, 135, 138-9 
reduction of, 138, 139, 157-8 
Dithiothreitol (DTT), 8, 158, 302, 
372-3 
Divergence, 325 
DL convention, 79-80 
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DNA, 43, 67 

alternative splicing, 366 

automated sequencing of, 164—5 

Axial ratio of, 178 

base stacking, 298 

-binding protein, 165-6 

charge distribution in, 147 

coding, 309 

Database of Japan (DDBJ), 306, 
309 

daughter strand, 161, 163 

denaturation of, 64, 192, 194, 293 

double-stranded, 85, 169 

viscosity of, 266 
minor groove, 85 

duplex formation in, 74, 298 

electrophoresis of, 161, 163-6, 
177-90 

ethanol precipitation of, 162, 285 

footprinting, 165-6 

fragmentation, 289 

hybridization, 288 

hypochromicity of, 61 

intercalation in, 266, 269, 273 

ligand binding to, 85, 269 

Amax, 60 

mechanical shearing, 179 

melting temperature (Tn), 64, 194 

molecular mass from agarose 
gels, 178 

minisatellite, 314 

mutation, 166-8 

NMR analysis of, 221, 224 

non-coding, 165, 309 

oxidation of, 372 

phenol-chloroform extraction of, 
162 

polylinker site, 161 

precipitated, 285 

primer, 161 

probe, 192, 194 

-protein interactions, 166 

regulatory, 165-7 

restriction digestion of, 179 

-RNA hybridization, 90 

satellite, 165 

secondary structure, 90 

sequencing, 161-5 

short tandem repeat 
polymorphisms (STRP), 
314-5 

single-stranded, 167, 299 

Southern blotting, see Blotting 


staining in agarose gels, 179 
sugar puckering in, 252 
template, 164 
tetraplex, 74 
triple helix, 224 
truncated fragments of, 161, 164 
vector uptake efficiency, 196 
viscosity of, 266 

DNAase I, 165-6 

DnaJ, 166 

Dna K, 206-7 

Doolitle, 320, 325 

Downstream processing of proteins, 

39-40, 263 

Drift tube, 120, 124 

Drug design, 346 

DsbA, 375 

Dye primer DNA sequencing, 164—5 

Dye terminator DNA sequencing, 

164-5 

Dynamic 
mobility, 99 
quenching, 69-70 

Dyneins, 342 


Eddy diffusion, 15-6 
Edman degradation, 131, 134, 
139-41 
EDTA, 8 
EGTA, 8 
Elasticity, of DNA, 178 
Electric 
charge, 147-8, 287-8 
field, 147, 284 
geometry, 181 
homogeneous, 181, 182 
inhomogeneous, 181 
inversion of, 182 
orientation in linear dichroism, 
83-4 
strength (E), 147, 181 
Electroblotting, 190-96 
apparatus, 190 
outline of experiment, 192 
physical basis, 190 
polypeptide preparation by, 193, 
196-7 
semi-dry, 190 
Electrochemical detection, 188 
Electrode 
array, 179 
end cap, 125 
ring, 125 


Electrodialysis, 283-4 
Electroelution, 149, 153, 195 
Electrolyte chamber, 188 
Electromagnetic radiation 
amplitude, 53-4 
frequency, 53 
microwave, 57-8 
nature of, 53-5 
radiowave, 57-8 
spectrum, 55-8, 98 
visible, 57-61 
X-ray, 57-8 
Electron 
beam, 113-4, 254 
capture dissociation (ECD), 
126-7 
density, 239 
equation, 250 
map, 236, 240; 
one-dimensional, 250; 
three-dimensional, 250-1; 
calculation of, 239, 250-55; 
error in, 251 
waves, 250 
diffraction, 254-5 
resolution of, 254 
impact ionization, 113-5 
microscopes, 254 
oscillation, 236 
paramagnetic resonance (EPR) 
see ESR 
paired, 65 
scattering factor (f), 236 
storage ring, 253 
transfer dissociation (ETD), 
126-7 
transport chain, 372 
unpaired, 65, 96 
Electronic 
detectors, 236 
energy levels, 55 
quantum number, 55 
transitions, 55-7, 100 
Electron spin resonance (ESR) 
probes, 99-100 
spectrum, 98 
effect of immobilization on, 
100 
spectrometer, 98 
spectroscopy, 96-9 
equipment used, 98 
physical basis of, 96, 98 
Electroosmosis, 149, 175 


Electroosmotic flow, 175-6, 185, 


187, 190 


Electrophoresis, 147—196 


agarose gel, 149 
capillary (CE), see Capillary 
electrophoresis 
constant power, 148 
constant voltage, 148 
denaturing, 155-60 
direction, 179 
free flow, 353-6 
free solution, 148 
gel, 147-53 
heat dissipation in, 148 
historical development of, 
148-9 
gels, 150-1 
discontinuous, 151-2 
gradient, 150 
staining methods in, 151 
historical development of, 148-9 
interfaced with MS, see MS 
mobility (u), 48, 174, 183 
moving boundary, 149 
non-denaturing, 153-5, 350 
physical basis of, 147-8 
pulsed field gel, 179-83 
applications of, 182-3 
contour-clamped homogeneous 
electric field gel, 
electric field gel, 180, 182 
equipment, 181-2 
field-inversion gel, 182-3 
orthogonal field alternating gel, 
182 
physical basis of, 179-81 
rotating gel, 183 
sodium dodecyl] sulphate 
polyacrylamide gel (SDS 
PAGE), 155-60, 191, 350, 
375, 376 


interactions, 201, 263 
repulsion, 39, 84, 202 
f-elimination, 376-7 
ELISA (enzyme-linked 
immunosorbent assay), 76, 
108 
Ellipticity, 80-1 
Elliptically polarized light, 81 
Elution 
batch flow, 21—2 
continuous flow, 21 
displacement, 22 
from affinity chromatography, 
33 
from hydroxyapatite, 37 
gradient, 21-2 
isocratic, 21 
profile, 21 
stepwise, 22 
Elutropic series, 43 
EMBL, see also European 
Bioinformatics Institute 
(EBI), 310 
Emission spectrum, 64 
Enantiomer, 78-9 
resolution, 129 
Enantioselective synthesis, 79 
Endopeptidases, 130 
Endoplasmic reticulum, 372, 
375 
Endothermic reactions, 292, 295 
Energy 
level, see also Transitions, 
214 
electronic, 55-6 
excited, 55, 59, 102 
free, see Free energy 
ground, 55, 59, 102 
internal, 294 
intermediate, 102 
minimisation, 329 
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chain, 201-2 
change (AS), 295 
Enzyme 
assay, 62-3 
cofactor, 31, 47-8 
continuous assay, 63 
inhibitor, 31 
-linked immunosorbent assay, see 
ELISA 
restriction, 179-80 
stopped assay, 63 
substrate, 31 
Epitope, 17-3 
Epoxide, 367 
Equatorial domain of GroEL, 207 
Equilibrium 
constant 3, 72, 203, 301 
dissociation and ligand binding 
constants, 3, 284 
from analytical 
ultracentrifugation, 280 
dialysis, 72, 284 
magnetization vector, 212 
Equivalence point, 174-5 
Erythrocyte, 137, 255, 289 
Equivalence point, 174 
Erythropoietin, 137, 289 
Esterification, 113, 125 
ESyPred3D, 307, 342 
Ethacrynic acid, 260 
Ethanedithiol, 378, 379-80 
Ethanol, 43 
NMR spectrum of, 94 
precipitation of DNA with, 285 
specific heat capacity (Cp), 296 
Ethidium bromide, 67, 150-1, 179, 
269 
Ethyl acetate, 43 
Ethyl ether, 43 
Ethylene oxide, 141 
Ethylenediaminetetraacetic acid 


limitations of, 157 potential, 276-9 (EDTA), 8, 35 
submarine, 177-8 quantised, 56-8 European Bioinformatics Institute 
urea, 160 rotational, 57-8 (EBD, 306-7 
Electrophoretic flow, 185-7, 190 vibrational, 57—8, 64 Evanescent 
Electroporation, 196 kinetic, 5 field, 105 
Electrospray ionization mass minimization, 336, 341 wave, 105 
spectrometry (ESI/MS), see Enterokinase, cleavage site in Ewald 


construction, 239-40 
circle, 239-40 
sphere, 239-40 
Excited state, 66 
lifetime, 64, 70-1, 73 


Mass spectrometry his-tag fusion, 38 
Electrostatic Enthalpy (H), 294, 297 
attraction, 39, 84, 147-8, 186, change in (AH), 294, 299 
202 Entropy (S), 205, 231, 294-5, 


field, 125 297 
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Excluded volume, 15 

Exoglycosidase, 372 

Exopeptidases, 130 

Exothermic reactions, 292, 295 

ExPASy, 306, 308 

Expression systems, 47 
heterologous, 39, 47 

Expressed sequence tags (EST), 315 

Extinction coefficient, molar (£), 58 


19F 212 
F,-ATPase, 102 
Factor VIII, 108 
FastA, 318-21 
Fast atom bombardment mass 
spectrometry (FAB/MS), 
119-21 
Fast atom gun, 123 
Fast protein liquid chromatography 
(FPLC), 43-4 
“Fe, scattering factor, 236 
Fenn, John, 355 
Ferguson 
gel, 153 
plot, 154 
Fibroblast, 300 
Fick’s first law, 275 
Ficoll, 273 
FISH (fluorescence in situ 
hybridisation), 77 
Filtration, sterile, 282-3 
Flow 
cell, 287 
core, 286 
cytometry, 286-90 
applications of, 287—90 
design, 286-7 
detectors in, 287, 288 
parameters measurable by, 288 
electroosmotic, 175 
electrophoretic, 185-8 
make-up, 129 
Newtonian, 204 
non-Newtonian, 204—5 
orientation in linear dichroism, 
83-4 
parabolic, 263-4 
Fluor, 64—7 
extrinsic, 64-5, 67, 288 
intrinsic, 64, 66-7, 288 
structures, 67 
Fluorescein, 67 
isothiocyanate (FITC), 76, 289 


Fluorescence, 64-77, 223, 352 
activated cell sorting (FACS), 289 
dark complex, 70 
detection 

in centrifugation, 271 

in chromatography, 20 
effect of temperature, 71-2 
effect of viscosity, 72 
external quenching of, 69-72 
immunostaining, 75-6 
in binding studies, 72, 74 
in cell biology, 75-7 
in protein folding, 73 
in situ hybridisation (FISH) 77 
internal quenching, 70 
intensity, 73 
labelling of cells, 286 
labelling of DNA, 77, 164-5, 362 
labelling of proteins, 354 
laser-induced, 102, 188 
lifetime, 70, 73 
measurement of, 68—69 
microscopy, 75-7 
physical basis, 66 
quenching, 68-73 
resonance energy transfer 

(FRET), 73-5, 103 

sensitivity of, 69 
spectroscopy, 64-77 
spectrum, 66 
steady state, 72 
time-resolved, 72 
uses of, 72-7 

Fluoride, 353 

Fluorophore, see Fluor 

Fluorophosphonate, 367 

Folding 
funnel, 205 
intermediates, 203-4 
of proteins, 64, 199-212 

biased, 204 
unbiased, 204 
traps, 205 
Food and Drug Administration 
(FDA), 39 

Formamide, 166 

Formazan, 153, 155 

Four-sector MS, see MS, 


Fourier 
analysis, 383 
Joseph, 383 


series, 250, 383 
synthesis, 243, 250-1, 254, 


transform, 87, 124-5, 213, 383-6 
infrared (FTIR) spectroscopy, 
87-90 
MS, see MS 
NMR, 96, 216-7 
Fragmentation in MS, 113-115, 
125-7 
Framework model of protein 
folding, 204-5 
Free 
energy (G), see Gibb’s free energy 
induction decay (FID), 124, 213, 
215-6 
interface diffusion, 233 
radical, 372 
solution CE, 179, 183 
Freeze-drying, see Lyophilization 
FRET (fluorescence resonance 
energy transfer), 73-5 
Frequency, 53-5 
factor (A), 292 
fundamental, 85 
group, 87 
labelling, 215 
Larmor, 92, 213 
of electromagnetic radiation, 
53-5 
of oscillation, 92-3 
radiofrequency, 57, 92, 98 
threshold, 55 
Friction, 263 
Frictional 
coefficient ( f), 147-8, 266, 268 
force (F*), 266 
Friedel 
law, 245, 250-51 
mates, 245, 250-51 
Fungal Genome Resource, 306 
Functional effects, 3 
Fundamental absorption, 85 
Fundamental frequency, 85 
Fuschin core, 372 
Fusion proteins, 31, 35-6, 38 
GST, 35-6, 189-90 
his-TAG, 35, 37-8 


Galactose, 65 
B-Galactosidase, 27, 65 
Gapped alignments, 318 
Gas chromatography (GC), see 
Chromatography 
Gel 
average, 352 


electrophoresis, 149-53 
crosslinked, 149 
non-crosslinked, 149 
non-restrictive, 172 
porosity, 149 
replicates, 352 
representation, 365 
restrictive, 14 
staining, 150-51, 
spacer, 151-2 
stacking, 151-2 
submarine, 177 
resolving, 151-2 
filtration chromatography, see 
also Chromatography, 25-8 
exclusion limit, 25, 26 
fractionation range, 26, 
native mass determination, 27-8 
in protein purification, 28 
GenBank, 309 
Genetic map, 310, 314 
Gene 
expression, 362 
matrix, 362 
walking, 163-4 
Geno3D, 342, 344-5 
Genome, 1, 306, 309, 349-50 
genetic map, 310 
physical map, 314 
size, 180, 309 
yeast, 315 
Genomics, functional, 359 
Genolevures, 306 
Gibbs 
free energy (G), 202, 276, 296 
change (AG), 202, 296 
Willard, 296 
Glide plane, 227 
Global energy minimum, 252-3 
Glucose, 372 
oxidase, 169, 257 
Glucuronide permease, 322 
Glutaredoxin, 374 
Glutathione, 31, 374 
-agarose, 21 
reductase, 374 
transferase (GST) 30, 48-50 
ligand binding to, 74 
three dimensional structure of, 
258 
Glutathionylation, 372 
Glycan branching, 370 
Glycine, 79, 


anion, 152 
titration curve, 4 
zwitterion, 152 
Glycopeptide, 371-2 
Glycoprotein, 157, 255 
oxidation of, 370-2 
Glycosylation, 125, 130-1, 133, 
137, 367, 369-72 
Goniometer head, 234—5 
GOLD, 346 
Gorg, A, 351 
Good, N.E. 7 
Good’s buffers, 7 
Goodsell, David, 346 
GOR (Garnier-Osguthorpe Robson) 
algorithm, 325 
Gradient gel electrophoresis, see 
Electrophoresis 
Gradient 
binary, 22 
elution, 21-2 
gel, 151 
maker, 22 
quaternary, 22 
ternary, 22 
Granularicity, 287-8 
Grating infrared spectrophotometer, 
86 
Green fluorescent protein (GFP), 
89 
Ground state, 66, 235 
Group frequency, 87 
GroEL, 207-12 
allosteric changes in, 20 
ATPase activity of, see ATPase cis 
ring, 
cis ring, 2078-9 
interaction with GroES, 209-11 
mass cut-off, 211-2 
structure of, 207-9 
substrates, 300-11 
trans ring, 208-9 
unfolding activity, 209 
GroES, function of, 209-11 
Ground state, see also Energy levels, 
56, 66 
Group-specific ligands, 31, 35 
GrpE, 207 
GST (glutathione transferase), 325 
Guanidination, 359 
Guanidium hydrochloride, 8, 266, 
302 
Guanine, 141 
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Guard column, 40 
Gyromagnetic ratio, 93 


1H, 117, 223 
AMU, 117 
% abundance, 117 
scattering factor, 236 
°H 
AMU, 117 
% abundance, 117 
Haem group, 60 
Haemoglobin, 136, 242 
absorption spectrum of, 61 
Haemophilia A, 108 
Half-mirror, 100, 103 
Hanging drop, 232 
Harker construction, 244 
Heat 
capacity, 296 
specific, 297-300 
dissipation, 129, 183 
generation in electrophoresis, 
148, 183 
gradient, 188 
shock proteins, 206 
transfer, rate of, 188 
Heavy metal, 242-3 
a-Helix, 80-3, 89, 324-5, 331, 334 
Height equivalent to a theoretical 
plate (HETP), 16 
Henderson-Hasselbach equation, 3 
Heparin, 302 
Heteroatoms, 331 
2-Hexadecanonoylamino-4- 
nitrophenylphosphorylcholine, 
65 
n-Hexane, 43 
High performance liquid 
chromatography (HPLC), 
23, 30, 40-3, 50 
detection, 41 
industrial-scale, 39 
interfaced with MS (LC/MS), 
128-9 
pumps, 40-1 
stationary phase, 41-2 
High pressure slurrying procedure, 
42 
High temperature spike, 119 
Hippocalcin, 316-7 
His-tag, 35 
Histidine, 7, 35 
absorbance spectrum of, 60 
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HIV protease, 78 
HL-60 cells, 356 
Homology, 310 
modelling, 336, 340-6 
Homoarginine, 359 
Horeseradish-peroxidase, 175, 192 
Hsp 10, see GroES 
Hsp 40, see Dna J 
Hsp 60, see GroEL 
Hsp 70, see DnaK 
Human genome project, 165, 
309-10, 314-5, 329, 355 
Hybridisation 
in situ, 77 
stringency, 193-4 
in Southern blots, 192, 194 
in microarrays, 362-3 
Hybridoma cells, 172, 174 
Hydration layer, 185, 187 
Hydrazine, 370-2 
Hydrazone, 370, 372 
Hydrodynamic properties, 263 
Hydrogen 
bonds, 6, 150, 177, 200, 201, 248, 
251, 263, 323 
emission, 54-6 
exchange, 210-1 
Hydropathy plots, 319-20, 322 
Hydroperoxides, 374 
Hydrophilic interactions, 28-9, 
155-6, 
Hydrophilicity, 5 
Hydrophobic 
collapse model of protein folding, 
204-5 
effect, 201-2 
forces, 201-2 
interaction, 29, 296 
chromatography, 29, 30, 200 
regions, 322 
Hydrophobicity, 5, 28, 29, 31, 202 
Hydrostatic pressure, 37, 39, 274 
Hyroxyapatite chromatography, 
37 
7-(2-Hydroxyethyl)guanine, 141 
2-Hydroxy-5-nitrobenzyl alcohol 
(HNB), 116-7 
Hydroxylamine derivatives, 372 
Hyperchromicity, 60-1 
Hypertext markup language 
(HTML), 9 
Hypertext transfer protocol (HTTP), 
9 


IDA, 21, 35, 37 
IgG, 172, 174, 177, 191-2 
variable region, 173 
constant region, 173 
Identity, immunological, 174-5 
Image analysis, 107, 151, 254-5, 
352—4 
Imaginary number (i), 384 
Iminodiacetic acid (IDA), 21, 35, 37 
Immobilines, 168-9 
Immobilized metal affinity 
chromatography (IMAC), 
see also Chromatography, 
35, 37-8, 355, 362 
Immunoelectrophoresis, 172-7 
counter, 176-7 
crossed, 176-7 
rocket, 174-6 
zone electrophore- 
sis/immunodiffusion, 174, 
176 
Immunodetection, 75-6, 172 
Immunodiffusion, 172 
double, 174-5 
Immunoelectrophoresis, 149 
counter, 176-7 
crossed (CIE), 176-7 
rocket, 174-6 
Immunogenicity, 173 
Immunoglobulins, see Antibodies, 
Immunoprecipitation, 285, 372 
Immunostaining, 68, 75-6 
direct labelling, 75-6 
indirect labelling, 75-6 
Inclusion bodies, 47, 212 
Infrared 
equipment, 86 
Fourier transform, 87—90 
group frequencies, 86-7 
physical basis, 84-6 
Raman, 90-1 
spectra, 85-6 
spectroscopy, 84-91 
dispersive, 87 
uses of, 86-7 
Institute of Genome Research 
(TIGR), 306 
In situ hybridisation, 77 
Intensity, of Bragg reflections, (J},;7), 
240-1 
Interchain disulphides, 375-6 
Interfaces in two-dimensional 
analysis, 127-9 


analyte enrichment, 128 
electrospray, 129 
ionspray, 128 
Interference, 54 
constructive, 87-8, 100, 237 
destructive, 87-8, 237 
in analytical centrifugation, 271 
Interferogram, 385 
Intermediate domain of GroEL, 
207-8 
Internal energy, 294 
Internal quenching, 68 
Intrachain disulphides, 375-6 
Intron, 310 
Inversion 
centre, 227 
population, 100, 102 
Todoacetanilide, 359 
Iodoacetamide, 357-8, 372-3 
-fluorescein, 372-3 
N-substituted, 359 
Iodoacetic acid, 372, 378-9 
Iodoacetoxy group, 357, 372 
1-Iodocytosine, 245, 247 
6-Iodocytosine, 247 
5-Iodouracil, 245, 247 
Ton 
background, 117 
beam, 114-5 
daughter, 127 
exchange, 22-5, 282 
groups, 24 
under denaturing conditions, 25 
titration curve analysis, 170 
fragment, 113, 115, 117-8 
leading, 186, 188 
matrix cluster, 119 
mobility, 186 
molecular, 115, 117, 121 
multiply-charged, 122 
pair reagents, 28 
parent, 127 
peptide, 125-7 
quasimolecular, 118, 119, 125 
spray ionization mass 
spectrometry (ISI/MS), see 
MS 
terminating, 185 
Ionization 
atmospheric pressure ionization, 
121 
chamber, 114 
chemical (CI), 118 


electron capture, 115, 117 
electron impact (EI), 113-4 
in FAB/MS, 119 
in MS, 118-22 
in PDI/MS, 120 
ion spray, 122 
iProClass, 307, 327 
Isodichroic point, 82 
Isoelectric 
focusing (IEF), 167-71, 350, 353 
experimental format, 170-1 
glass tubes, 171 
point, (pI), 23, 167, 169-70, 
351-2 
Isoenzyme, 155 
Isomers, optical, see Enantiomers 
Isomorphous replacement, 242—4 
multiple, 242-4 
single, 242-4 
Isopycnic 
centrifugation, 273—4 
density, 273 
Isosbestic point, 59-60 
Isotachophoresis, see also capillary 
isotachophoresis (CITP), 
129, 152-3, 186, 188, 353 
Isothermal 
heat absorption, 295 
titration calorimetry (ITC), 295 
curves, 297-9 
4-Isothiocyanato-TEMPO, 99, 100 
Isotope 
abundance, 117 
coded affininity tags (ICAT), 
356-8, 360, 372-3, 377, 
379 
mass, 117 
in MS, 117 
in NMR, 93, 97 
peaks, 117 
Isotropic hyperfine coupling 
constant, 98 
Iterative annealing, 212 


J-constant, 95, 215, 216 
J-coupling, 94, 215 
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AMU, 117 

% abundance, 117 
0K 

AMU, 117 

% abundance, 117 


“K 

AMU, 117 

% abundance, 117 
Ka rays, 234 
Karplus equation, 215, 216 
Karyotyping, 288 
Kendrew, 242 
A°-3-Ketosteriod isomerase, 

220 

Kinesin, 102, 342 
Kinetic energy, 5 
Kronig-Kramers transformation, 80 
Krypton, 123 
Kuru, 184 
Kyte, 320, 325 


Labelling 
direct, 75-6 
indirect, 75-6 
Labile region, of phase diagram, 
230-1 
B-Lactoglobulin, 168 
Lactate dehydrogenase (LDH), 27, 
156, 272 
Lactic acid, 115 
Lande splitting factor, 98 
Larmor frequency, 92, 213 
Laser, 99-100, 102-3 
applications of, 100, 102-3 
in MALDI/MS, 121 
design, 103 
elliptical, 287 
-induced fluorescence, 102 
origin of, 100 
ruby, 100 
scanning, 102 
tunable, 100, 103 
Lattice 
body-centred, 228-9 
Bravais, 228-9 
C-centred, 228-9 
constants, 226 
cubic, 228-9 
face-centred, 228-9 
hexagonal, 228-9 
monoclinic, 228-9 
orthorhombic, 228-9 
plane, 238-9 
primitive, 228-9 
reciprocal, 238-9 
tetragonal, 228-9 
triclinic, 228-9 
trigonal, 228-9 
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LC/MS, 128-9 
LC-tandem MS, 358 
Lectins, 355, 370-1 
Leupeptin, 8 
Levinthal’s paradox, 204-5 
Levorotatory, 79 
Library of compounds in MS, 117 
Ligand, see also Affinity 
chromatography 
affinity by CE, 190 
binding, 276 
and atomic force microscopy, 
103-104 
and circular dichroism (CD), 
83 
and fluorescence, 72—74 
and surface plasmon resonance, 
106 
constant (Kg), see Equilibrium 
constants 
utilization in membrane-based 
chromatography, 46 
Light amplification by stimulated 
emission of radiation, see 
Laser 
Light, see also Electromagnetic 
radiation 
absorption, 58—60, 79 
coherent, 100 
diffraction, 55-6 
elliptically polarized, 81 
emitted, 64, 69 
incident, 58 
infrared, 57, 84 
laser, 99-100, 102-3 
linearly polarized, 77 
monochromatic, 61 
non-coherent, 100 
nature of, 53-5 
particulate nature of, 55, 57 
plane-polarized, 73, 77-9 
quantum theory of, 54 
refraction of, 79 
scattering, 90-1 
coherent, 100 
forward, 287 
non-coherent, 100 
low angle, 287 
right-angle, 287 
transmitted, 58 
ultraviolet, 57 
visible, 55-7 
Limiting sphere, 239, 251 
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Linear dichroism (LD), see also 
Dichroism 83-4 
orientation methods, 84 
spectrum of DNA, 85 
Linearly polarized light, see also 
Dichroism, 77 
Linkage, 310 
Lipid monolayer, 254 
Lipid peroxidation, 372 
Lipoproteins, 273 
Local energy minimum, 252-3 
London, dispersion forces, see van 
der Waals interactions, 
Loop searching, 340-1 
Lorentz factor (L), 241 
Low performance LC, see 
Chromatography, open 
column 
Luciferase, 66, 211 
Luciferin, 66 
Luminol, 67-8 
Luminometer, 68-9 
Lupus erythematosus, 362 
Lyophilization, 286 
Lysine 
guanidination, 359 
in actin, 267-8 
in affinity chromatography, 31 
in crosslinking, 159 
titration curve, 4 
Lysozyme 
absorption spectrum, 60 
crystals of, 237 
folding, 204 
NOESY spectrum of, 218 
pl, 169 
Lysylendopeptidase (Lys-C), 137 


M13 bacteriophage, 161-2 
Magic angle, 224-5 
Magnetic 
dipole, 202 
field, 92, 123 
focusing of ions in MS, 123 
moment, 91 
31P, 96 
quantum number, 92-3, 98 
resonance imaging (MRI), 96 
spin, 64, 91, 93 
coupling, 94-5, 215 
states, 92 
Magnetization (M), 212, 217 
longitudinal component, 212-3 


macroscopic, 212-3 
molecular, 212 


transverse component, 212-3 


vectors, 212-3 
Magnetogyric ratio, 93 
Maiman, Ted, 100 
MAL-6, 99 
Maleimides, N-substituted, 359 
Mannose, 372 
Martz, Eric, 334 
Mass 

/charge ratio (m/z), 113, 
123-4, 

buoyant (m), 266, 275 

in electrophoresis, 153-4 

monoisotopic, 117 

relative molecular, (M,), 1 

average relative molar, 122 

cutoff in gel filtration, 25-7 

cutoff in dialysis, 283 

cutoff in ultrafiltration, 

281-2 

in MS, 113, 117, 123 

native, 153-4 

of protein subunits, 25-7 

of biomolecules, 8 

spectrometry (MS), see MS 
spectrum, 113-4 
transfer 

mobile phase, 15-6 

rate, 109 

stationary phase, 15-6 

stagnant mobile phase, 15-6 
Matrix 

-assisted laser desorption 


ionisation mass spectrometry 


(MALDI/MS), see MS 

cluster ions, 119 

in MS, 119, 125 

liquid, 119, 121-2 

solid, 119, 120 
Maximal segment pair score, 317 
McKinnon, Roderick, 305 
Membrane 

-based chromatography, 45-7 

fluidity, 5 

nylon, 194 

proteins, 99 

PVDF, 195 

semi-permeable, 232, 282 
2-Mercaptoethanol, 8, 157-8 
Merrifield 

Bruce, 200 


solid phase peptide synthesis 
method, see Peptide 
synthesis 
Meselson-Stahl experiment, 273 
Mesophilic bacteria, 5 
Metabolome, 350 
Metastable region of phase diagram, 
230-1 
Methane, 118 
Methanol, 43, 230 
Methionine 
oxidation of, 132, 133, 372 
replacement with 
selenomethioinine, 246 
sulphone, 372 
sulphoxide, 372 
Methyl sulphonate (S) group, 24 
Micellar electrokinetic separation, 
190 
Micelle, 190 
Michael addition, 377, 379 
Michelson interferometer, 87-8 
Microarrays, 359, 362-3 
Microcrystals, 228 
showers of, 231 
Microdialysis, 140, 232 
Microgravity, 233 
Microheterogeneity of proteins, 
130-1, 230 
Microinjection, 196 
Microprojectile, 196 
Microsatellite, 315 
Microscopy, confocal, 102 
Microtitre plate reader, 62 
Microtubules, 277-80 
Microwave radiation, 57-8, 98 
Miller indices, 237, 243 
Minisatellite, 314 
Mirror image, non-superimposable, 
78 
Mirror plane, 227 
Mitochondria, 96-7 
Mobile phase, 11, 17 
Mobility, electrophoretic, 148 
Model 
homology, see Homology 
modelling 
mathematical, 376 
preliminary, 251-2 
refined, 252 
ribbon, 257-8 
space-filling, 257-8 
wireframe, 257-8 


MODELLER, 307, 342 
Modelling, structure, 251 
Molar extinction coefficient (£), 58, 
79-80 
Molecular replacement, 244-5 
Molten globule, 203, 205-6 
Momentum, 113 
Monochromator, 54 
graphite, 235 
Morse diagram, 58-9, 66, 86 
Mother liquor, 233-4 
mRNA, 194-5, 349, 359, 362 
poly-U, 31 
MS, 113-44 
analyser, 114, 123-5 
double-focusing magnetic 
sector, 123-4 
electrostatic, 123-4 
quadropole mass, 123-4 
time-of-flight (TOF), 120, 124 
atmospheric pressure chemical 
ionization (APCI/MS), 119, 
122 
caesium gun, 123 
chemical ionization (CI/MS), 
118-9, 123 
electron capture dissociation 
(ECD), 126-7 
electron impact (EI/MS), 113, 
118, 122-3, 142 
electron transfer dissociation 
(ETD), 126-7 
electrospray ionization (ESI/MS), 
118, 121-2 
of DNA, 139-40 
fast atom bombardment 
(FAB/MS), 118-20, 126, 
136, 138 
fast atom gun, 123 
Fourier transform MS (FTMS), 
124, 127, 142-3 
four-sector, 127 
in negative ion mode, 115 
in positive ion mode, 115 
in proteomics, 355-9 
interfaced with capillary 
electrophoresis (CE/MS), 
188 
interfaced with electrophoresis 
(electrophoresis/MS), 129 
interfaced with gas 
chromatography (GC/MS), 
128-9, 141 


interfaced with HPLC (LC/MS), 
127-8 
interfaced with other methods, 
127-9 
ion cyclotron resonance Fourier 
transform, 124—5 
ion spray ionization (ISI/MS), 
118, 122 
ionization modes, 118-22 
length of proteins, 127 
magnetic focusing, 123 
matrix-assisted laser desorption 
ionization (MALDD, 118, 
120-1, 138-40, 361, 362 
microheterogeneity of proteins, 
130-3 
of proteins and peptides, 125-7, 
130-43 
overview of experiment, 115, 117 
physical basis of, 113-5 
plasma desorption ionization 
(PDI/MS), 118, 120 
principle of, 113-8 
resolution, 123 
source, 115, 122 
tandem (MS/MS or MS?), 127, 
355, 377 
trajectory arc, 123 
uses in biochemistry, 129-143 
MTT ([3-(4,5-dimethyl-2-thiazolyl)- 
2,5-diphenyl-2H- 
tetrazolium bromide], 155 
Multiple drug resistance (MDR), 74, 
323 
Munich Information Centre for 
Protein Sequences, 303 
Mutation, detection, 167-8 
Myoglobin, 169, 242 
Mylar, aluminized, 120 
Myosin, 102, 342 


IN, 273 

ISN, 212, 220, 223, 273 

N-linkage, of sugars, 367, 369 

N-acetyl hexosamine, 137 

N-ethyl maleinide (NEM), 372, 377, 
378-9 

National Centre for Biotechnology 
Information (NCBI), 303-4, 
308, 315 

N,N'-Methylene bisacrylamide, 150 

N-formyl methionine, 130-1 

N-glycosidase, 370-2 
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N-hydoxysuccinimide, 361 
N-nicotinoxysuccinimide, 360 
N-terminal 
blocking, 130-1 
sequencing, 139, 359 
NAD+, 61, 288 
NADH, 61, 288 
Amax, 61 
Nascent polypeptides, 206-7 
Nebulising gas, 122 
Neutron 
beam, 254 
diffraction, 254 
Newton, Isaac, 54, 263 
Nile Red, 151 
Nitro blue tetrazolium chloride, 156 
Nitrocellulose, 118, 175, 193, 195 
4-Nitrophenol, 63 
Nitrosothiol, 372 
5-Nitroxyloxazolidine, 99 
NMR, see Nuclear magnetic 
resonance 
Normal phase chromatography, 29 
Non-coherent light, 100 
Nuclear magnetic resonance (NMR) 
spectroscopy, 91-6 
atomic identity and, 93, 212 
four-dimensional (4-D NMR), 
220-1 
measurement, 95-6 
multidimensional, 96, 212, 221-4 
advantages of, 224 
anisotropic samples in, 224-5 
applications of, 218-25 
comparison with X-ray 
diffraction, 256-7 
coupling constant (J), 95, 215, 
216, 217 
cross-peaks in, 215, 218 
cross-relaxation in, 214 
effective rotational correlation 
time (Teff), 214 
heteronuclear, 216, 217, 220, 
221, 223 
homonuclear, 216, 217, 223 
isotopic labelling in, 220-21 
isotropic samples, 224-5 
limitations of, 225 
M, limit of, 221 
of RNA, 224 
off-diagonal peaks in, 215, 218 
relaxation in, 212-4 
sample concentration in, 226 
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Nuclear magnetic resonance (Cont.) 
solid state, 224, 225 
spin-pairing, 94-5, 215 
time-resolved, 256 

physical basis, 91-3 
31P 96, 97, 212 
probe, 95 
proton, 91-4 
shielding, 93-4 
spectrometer, 96 
spectroscopy, 91-6 
spectrum, 94 
measurement of, 95-6 
spin coupling, 94-5 
three-dimensional (3-D NMR), 
214 
two-dimensional (2-D NMR), 82, 
214 
detection period, 215-6, 221 
evolution period, 215-6, 
221 
mixing period, 215-6, 221 
preparation period, 215-6, 221 
Nuclear Overhauser effect (NOE), 
214, 218-9 
inter-residue, 218-9 
spectroscopy (NOESY), 214 
Nucleation-condensation model of 
protein folding, 204—5 
Nucleic acids 
absorbance spectroscopy, 60-1 
databases, 306, 309 
electrophoresis in agarose gels, 
177-183 
circular dichroism, 83 
Fourier transform infrared (FTIR) 
spectroscopy, 90 
Nucleosome, 289 
Nucleus 
stable, 231 
unstable, 231 


160 

AMU, 117 

% abundance, 117 

scattering factor, 236 
170 

AMU, 117 

% abundance, 117 
180 

AMU, 117 

% abundance, 117 
O-linkage, of sugars, 367, 369 


Octyl group, in hydrophobic 
interaction chromatography, 
29 
O’Farrell, Patrick, 349 
Ohm’s law, 148 
Oligo A, 31 
Oligo dT, 31 
Oligonucleotide, 77, 162, 163-4, 
192, 194, 221, 245, 247, 288, 
298, 359, 362 
Olson, Arthur, 346 
Open reading frames (ORF), 310 
Open source, programs, 334, 336 
Optical 
centre, see also asymmetry, 
78-9 
detection in flow cytometry, 
287 
isomers, 78 
pumping, 100-1, 103 
rotatory dispersion (ORD), 
78-80 
systems in analytical 
centrifugation, 271 
traps, 102 
“tweezers”, 102 
ORD (optical rotatory dispersion), 
78-80 
ORF, see Open reading frames 
Organelles, 288 
Orientation 
effects 
in absorbance spectroscopy, 
59-60 
in ESR, 98 
methods in LD, 84 
Oriented samples, 83 
Origin, of reciprocal lattice, 240 
Ostwald viscometer, 264 
OTMPM, see 1-Oxyl-2, 
2,5,5,-tetramethyl 
D3-pyrollidine-3-methyl)- 
methane 
thiosulphonate 
Ouchterlony, double 
immunodiffusion, 174—5 
Ovalbumin, 169 
Overall plate height equation, 18 
Oxidation, 372—4 
Oxidative stress, 372 
Oxidoreductase assays, 59, 63 
Oxygen isotopes, 360 
Oxyluciferin, 66 


(1-Oxyl-2, 2,5,5,-tetramethyl 
D3-pyrollidine-3-methyl)- 

methane thiosulphonate (OTMPM), 
99, 101 


32P 376 
31p 
AMU, 117 
% abundance, 117 
in NMR, 212 
p-53, circular dichroism of, 82 
31P NMR, 97 
PANDIT database, 307, 327 
Parabolic flow, 263-4 
Paramagnetic, 98 
Partial specific volume (v), 266 
Partition coefficient (K), 11-2, 19 
Particle diameter, 19, 268 
Parts per million (PPM), 94, 215 
Pascal’s triangle, 95, 98 
Patterson, difference maps, 243-4 
Pauli exclusion principle, 65 
PDB, see Databases 
PDI (plasma desorption ionization), 
see MS 
Peak 
area, 14 
base, 117 
broadening, 15 
leading, 22 
splitting, 94-5, 98 
symmetry, 19 
trailing, 22 
Pellicular particles, 41 
Pentane, 43 
Pentose phosphate pathway, 
374-5 
Pepsin, 169 
Pepsinogen, 169 
Peptide 
deletion, 133, 135 
fragmentation in MS, 125-6 
mapping 
by MS, 133, 136 
by reversed phase HPLC, 30, 
50 
by SDS PAGE, 157 
microheterogeneity, 
by FAB/MS, 136 
size-excluded bank, 282 
synthesis, 133-5, 320 
Percol, 273 
Performic acid, 8, 136, 138, 377 


Perfusion chromatography, see also 
Chromatography 
practice of, 45 
stationary phases, 45-6 
theory of, 44-5 
Perfusive 
particles, 45 
pores, 45 
Periodic acid, 370 
Permeate, 281 
Peroxidase, 65, 362 
Peroxireductases, 374 
Peroxyredoxin, 374 
Perutz, Max, 242 
Pfam database, 307, 325 
P-glycoprotein, 74 
pH 
definition of, 3 
effect on biomolecules, 3, 166-7 
effects on proteins, 23, 167-9 
gradient, 168-70 
free ampholytes, 167-8, 349 
immobilized (IPG), 168-70, 
349-52 
titration curves, 3—4 
variation in elution from IMAC 
columns, 38 
Phage, 161-2 
Phase 
angle (nx), 240, 241-2 
bonded, 12, 42 
diagram, 230 
liquid, 256, 294 
mobile, 12, 17-8, 21 
of waves, 641 
problem, 241-3 
solid, 195, 256, 294 
stationary, 12, 17-8, 263 
Phenanthrolene, 269 
Phenyl group, 201 
in hydrophobic interaction 
chromatography, 29, 31 
Phenylalanine, 201 
absorbance spectrum, 60 
deprotonation, 60 
Amax, 60 
Phenylisocyanate, 138 
Phenylisothiocyanate (PITC), 
138-40, 360 
Phenylmethy] sulphony] fluoride 
(PMSF), 8 
Phosphatase, 63, 65, 190, 272, 362 
Phosphate, 7 


Phosphination, 361 
Phosphocreatine, 97 
Phosphonium groups, 359, 361 
Phospholipid, 224 
Phosphoproteome, 376-9 
Phosphopeptide, 377, 378 
Phosphorescence, 66 
Phosphorylation, 97, 131-2, 139, 
141, 190, 376 
Photocleavage site, 358 
Photoconversion, 89 
Photodetector, 61-2 
Photodiode arrays, 62 
Photoelastic modulator, 80-1 
Photoelectric effect, 54—5 
Photomultiplier, 61-2, 80-1, 
115, 
Photon, 55, 100 
Photosynthesis, 99 
Physical attributes 
of biomolecules, 2 
of cells, 288 
Piezoelectric crystal, 287 
Pilot scale protein purification, 40 
PIRSF, 307, 327-8 
pK,, 3, 59-60 
Principle of superposition, 53, 87 
Plane of polarization, 77 
rotation of, 78 
Planck’s constant, 57, 92, 215 
Planck, Max, 54 
Plane-polarized light, see also 
Circular dichroism (CD), 73, 
77-84, 103 
Plasma, 120 
beam, 120 
desorption ionization mass 
spectrometry (PDI/MS), see 
Mass spectrometry, 
Plasmin, 108-9 
Plasmon, 105 
resonance spectroscopy, see 
Surface plasmon resonance 
(SPR) 
Plate height equation, 15-9, 44-5 
Plate number, 16-9 
Polarisation, 73 
factor (P), 241 
plane of, 79 
Polarity 
effects in absorbance 
spectroscopy, 59-60 
solvent, 5 


INDEX 401 


Polarized light, 77-8 
circularly, 78-9, 81 

Polaroid, 77, 81, 

Population inversion, 100 

Positional cloning, 77 

Polyacrylamide, 150, 178 

Poly-alcohols, 230, 232 

Polybuffer, 173 

Polyethylene glycol (PEG), 230, 367 

Poly-L-lysine, secondary structure 
and CD, 81-2 

Polymerase chain reaction (PCR), 
165, 314 

Polymorphism, 194 

Poly Pro II, 323 

Polysaccharides, 150 

Poly(styrenedivinylbenzene), 21, 45 

Polyvinylidenedifluoride (PVDF) 
membrane, 195, 370 

Ponceau red, 151 

Population inversion, 100, 102 

Porphyrins, 288 

Potential energy, 276-9 


barrier, 276-9 
POROS, 21, 45 
Power 


constant, 148 
electrical, 148 
Precession, 212 
Precipitant, 230 
Precipitation 
acetone, 285 
acid, 285 
ammonium sulphate, 284—5 
methods, 284-6 
of DNA, 285 
of proteins, 5, 
salt, 284-5 
solvent, 285 
trichloroacetic acid (TCA), 285 
Precipitin, 174-5 
Predict protein server, 324-5 
Primer walking, see Gene walking 
Prion, 184 
Prism, 54, 
Probe, 99—100 
PROCHECK, 307, 342 
Programs, 307-8 
Projection density map, 255 
Proline, 323 
Protamine K, 8 
2-Propanol, 43 
PROSITE, 307, 326-7 
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Proteases, 47, 
Protein 


A, 31 
acylation, 130-1 
aggregation, 5, 
binding, 362-3 
biochips, 362 
carboxymethylation, 125 
charge isomerisation, 130-1 
chromophores, 59-62 
circular dichroism spectra of, 
80-2 
cis peptide bond, 201-2 
class I and II, 203 
complex formation, 297 
concentration determination, 8, 63 
concentration of, 40, 
crosslinking, 159 
deamidation of, 131-2 
denaturation of, 155, 200-203 
disulphide, 125, 373 
isomerase (PDI), 375 
-DNA interactions, 102 
domains, 213, 325 
downstream processing of, 39-40 
explorer, 334 
extrinsic fluors, 67 
families, 325 
fragmentation in MS, 125-7 
fourier transform infrared (FTIR), 
87-8 
folding, 73, 81-3, 200-6, 208-12 
and fluorescence, 73 
intermediates, 203-4 
pathways, 204-6 
problem, 199-206 
two-state model, 73, 82, 201, 
203-4 
FTIR analysis of, 87—90 
G, 31, 
globular, 6, 27-8, 84, 
glycosylation, 125, 130-1, 133 
graphics, 307, 331, 334-6 
green fluorescent (GFP), 89 
hydrogen bonding, 6, 155 
Information Resource (PIR), 
305-7, 309, 311-2 
industrial scale chromatography, 
39-40 
intrinsic fluors, 67 
kinase, 189, 362, 372, 376 
ladder sequencing, 138-40 
ligand binding to, 103, 107-10 


melting temperature (T) of, 203, 
300-2 
membrane, 101, 254—5 
microheterogeneity of, 130-1 
mis-folding, 336 
motif, 325 
N-formylation, 131 
N-formy] methionine in, 131 
nitrosylation, 372 
oligomeric, 297 
oxidation, 132, 372 
phosphatases, 190, 376 
phosphorylation, 130-1, 140 
post-translational modification, 
130-1, 133, 137, 366-79 
primary structure, 199 
precipitation, 5, 230 
proteolysis, 132, 366-7 
purification, 28, 47-50, 284 
quantitation, 8 
quaternary structure, 27 
ragged ends, 130, 132 
random coil, 323 
viscosity of, 266 
rational protein design, 200 
recombinant, 362 
reduction, 125 
salt bridges in, 155 
secondary structure of, 323 
determination; by CD, 80-3; by 
FTIR, 89 
prediction, 323-5 
sequencing by MS, see also 
Edman degration, 130, 
136-40 
single molecule studies, 102-3 
size fractionation of, 25-8 
stability of, 47 
subunits, 25-8, 366 
surface area, 200 
tertiary structure, 199-200, 376 
transfer efficiency, 192 


Proteome, see also Proteomics, 1, 


113, 349-50 


Proteomics, 349-80 


descriptive, 349 
functional, 349 
interaction, 349 
organelle, 273 


Proton NMR, 91 
Protoplast, 196 
PSA server, 325 
Pulse 


dampeners, 41 
laser, 121 
of radiofrequency radiation, 93 
time, 179 
Pulsed field gel electrophoresis, see 
Electrophoresis 
Pump 
Peristaltic, 274 
Reciprocating, 41 
Pumping, optical, 100, 102 
Purification 
factor, 31 
protocol, 47 
table, 50 
Pulse time (¢,), in pulsed field gel 
electrophoresis, 179 
PyMOL, 334, 336 
Pyridine, 7 
Pyrollidone carboxylic acid, 131 


Q group, see Quaternary ammonium 


QAE, see Quaternary aminoethyl 
Quadropole, ion storage system, 125 
Quantum, 54 

number, electronic, 55 

spin, 92 

theory, 54 

yield, 68-9 
Quasimolecular ion, 118, 119 
Quartz 

chip, 359 

cuvette, 62 

crystal, 80 
Quaternary ammonium (Q), 24 
Quaternary aminoethyl (QAE), 24 
Quenching of fluorescence, 68-73 

collisional, 70 

dynamic, 69-70, 73 

efficiency, 71 

external, 69—72 

internal, 68 

mechanisms, 70 

static, 70, 73 

temperature-dependence, 71-2 


R factor, 241, 252-3 
Radial flow chromatography, 39 
Radiation 

emitted, 68 

gamma, 57 

infrared, 56-8 

longwave, 57 

microwave, 57-8, 98 


radiowave, 57, 92, 98, 124, 213 
shortwave, 57 
synchrotron, 253 
ultraviolet, 56-8 
visible, 54-8 
X-rays, 57-8 
Radicals, 115 
Radioactivity, 367 
Radiolabel, 161, 164 
Radiowave radiation, 57, 92, 98, 124 
Raman 
scattering, 90 
spectrometer, 90 
spectrum, 91 
Ramachandran 
diagram, 323 
window, 336, 340 
RasMol, 257-8, 334 
RasTop, 335 
Rayleigh 
line, 91 
scattering, 90-1 
Reactive oxygen species (ROS), 372 
Real space averaging, 245 
Reciprocal space, 238-9 
Reducing 
agents, 8, 157-8 
enzymes, 374 
Refinement 
constraints, 250 
restraints, 250 
of structure, 241, 251-4 
by least squares, 252-3 
by simulated annealing, 252 
Reflectance, total internal, 105 
Reflectivity, 105-7, 110 
Refractile bodies, see Inclusion 
bodies 
Refraction, of light, 79 
Refractive index, 79, 103, 105, 271, 
288 
Relative centrifugal force (RCF), 270 
Relative molecular mass, M,, 1, 8 
Relative phase angle (a1), 243-4 
Relaxation, see also NMR 
cross, 214, 217 
longitudinal, 213 
transverse, 213 
Relibase, 346 
Reorientation angle, 179, 181, 183 
Replicative form (RF) of M13, 
162-3 
Reporter molecules, 59, 99 


Research Collaboratory for 
Structural Bioinformatics 
(RCSB), 306 

Resolution, 15 

in chromatography, 14—5 

in electron diffraction, 254 

in electrophoresis, 148, 183 

in MS, 123 

in structure determination, 251 
Resonance 

condition, 92 

electron paramagnetic (EPR), 98 

electron spin (ESR), 96-100 

energy transfer, 73-5 

efficiency of, 73-4 

hybrids, 155 

nuclear magnetic (NMR), 91-6 

surface plasmon (SPR), 103, 

105-10 

Restraints, on bond angles, 217, 218 
Restriction 

digest map, 192 

enzyme, 179-80, 192, 362 

fragments, 192, 194 

length polymorphisms, 192, 
194 

Retardation 

coefficient (K,), 153-4 

factor (RF), 15 

in chromatography, 15 
Retentate, 281 
Retention, 14-5 

time, 14 

volume, 14 

Reversed phase, chromatography, 
see also Chromatography, 
28-9 

HPLC, 355 
In LC/MS, 127 

Rhodes, Gale, 342 

Ribonuclease, 64, 204, 301 

Ribonucleotide reductase, 374 

Ribosome, 83, 246-7 

Ribulose-1,5-bisphosphate 
carboxylase-oxygenase 
(RUBISCO), 210-1 

RNA 

adenosine deaminase (ADAR1), 
247-8 

hybridization, 77 

Amax, 60 

NMR analysis of, 224 

-protein complex, 246-7 
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Roentgen, Wilhelm, 235 
Roeperstaff, nomenclature in MS, 
135, 126 
Root mean square deviation (rmsd), 
336, 344 
Rotating anode generator, 235, 
247 
Rotating field gel electrophoresis, 
183 
Rotation, 227 
Function, 244 
Rotational correlation time, effective 
(Teff), 214 
Rotor, 269 
fixed angle, 270 
spindle, 269-70 
speeds, 276 
swinging bucket, 270 
rRNA, 31 
RS convention, for enantiomers, 78 
Ruby laser, 100 


325 
AMU, 117 
% abundance, 117 
33 S 
AMU, 117 
% abundance, 117 
345 
AMU, 117 
% abundance, 117 
35 S 
AMU, 117 
% abundance, 117 
S group, see Methyl sulphonate 
Safety loop, 37 
Salt bridges, in proteins, 201-2 
Salting out, 285 
Sample cell 
double-sector, 271 
ITC, 295 
single-sector, 271 
Sanger 
Institute, 306, 325 
dideoxynucleotide sequencing, 
161-5 
Saturation point, 230 
Sayle, Roger, 334 
Scalar coupling, see Spin coupling 
Scaling factor (K), 241 
Scanning probe microscopy (see 
also Atomic force 
microscopy), 103 
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Scattering 
anomalous, 236, 245-50 
coherent, 236 
factor ( f), 236, 245, 250 
Raman, 90 
Rayleigh, 90-1 
SCOP database, 307, 331, 334 
Schiff base, 370 
Scrapie, see Spongiform 
encephalopathies, 
transmissible (TSE), 
Screw axis, 227 
SDS PAGE, see Electrophoresis 
Sealed tube generator, 235 
Secondary structure, see Protein or 
DNA 
prediction, 323-4 
Sedimentation, 266 
coefficient (s), 268, 275 
apparent (s*), 275 
equilibrium analysis, 276, 279, 
280 
flux (Js), 276 
velocity, 266 
analysis, 274-6, 277-9 
SELDI (surface-enhance laser 
desorption ionization), 362, 
365 
Selective desorption, 119 
Selenomethionine, 245 
Separation factor, 19 
Sequence 
databases, 309-15 
nucleotide, 309 
protein, 309 
tagged sites (STS), 315 
Sequential resonance assignment, 
218-9 
Serine 
hydrolases, 364-6 
N-linkage of sugars, 368-9 
O-linkage of sugars, 368-9 
phosphorylation, 131, 376 
proteases, 280 
Shark’s tooth comb, 161-2 
Shear 
force (f), 264 
gradient (G), 264-5 
stress, 264 
B-Sheet, 80, 221-3, 324-5, 331, 334 
Shielding 
constant, 93 
in ESR, 98 


in NMR, 93 
Shotgun genomics, 314-5 
Sialic acid, 137 
Signal transduction cascade, 376 
Silica, 42, 183, 187 
Silicate, 42 
Siloxanes, 42 
Silver staining of proteins, 151, 350 
Simulated annealing, 252-3 
Sinapinic acid, 118 
Single-molecule chemistry, 102 
Single-stranded conformation 
polymorphism (SSCP) 
analysis of DNA, 165-8 
Singlet state, 65 
Site-directed 
mutagenesis (SDM), 222, 243, 
247 
spin labeling, 99, 
101-2 
Sitting drop, 232-3 
Small multidrug resistance proteins, 
323 
Smoluchowski equation, 71 
Sodium dodecyl sulphate (SDS), 8, 
157, 190 
Solid phase capture of 
phosphopeptides, 378 
Solid phase peptide synthesis, see 
Peptide synthesis 
Solvation, 
shell, 5—6, 200, 253, 285 
of proteins, 200, 276 
Solvent 
elutropic series, 42-3 
perturbation, 59-60 
polarity, 5 
precipitation, 285 
Southern, Prof Ed, 191 
Soya bean trypsin inhibitor, 169 
SP group, see Sulphopropyl 
Space group, 227, 228 
Sparging, in chromatography, 20, 42 
Sparse matrix sampling, 231 
Specific activity, 8 
heat, see Heat capacity 
Spectrofluorimeter, 68-9 
Spectrometer 
ESR, 98 
NMR, 92 
Raman, 90 
Spectrophotometer 
diode array, 61 


dual beam, 62 
infrared, 86 
single-beamed, 62 
split-beamed, 62 
Spectropolarimeter, 80-1 
Spectroscopy, 53-112 
UV-visible, 58-64 
circular dichroism (CD), 79-83 
electron spin resonance (ESR), 
96, 98-9 
fluorescence, 64—77 
infrared, 84—91 
linear dichroism (LD), 83-4 
NMR, 91-6 
Raman, 90-1 
transitions, 56 
Spectrum, see also Electromagnetic 
spectrum, 55-6, 98 
absorbance, 60-1, 66, 89 
circular dichroism (CD), 80, 82 
derivative, 61, 98 
difference, 61, 86, 89 
emission, 64 
electron spin resonance (ESR), 
98, 100 
fluorescence, 66 
FTIR, 89 
infrared, 85-6, 89 
linear dichroism (LD), 85 
mass, see MS 
nuclear magnetic resonance 
(NMR), 94, 97, 219 
Raman, 91 
Sphingomyelinase, 65 
Spin, see also Magnetic spin 
coupling, 94-5, 215 
coupling, 94-5, 215 
labelling, site-directed, 99, 101-2 
labels, 99 
accessibility parameter, 102 
magnetic, 65 
moment, 96, 97 
net, 95 
pairing, 216 
probes, 99-100 
quantum number, 92-3, 98, 212 
state, 92, 212, 214 
Spindle, 269-70 
Splice signal, 310 
Spontaneous reactions, 295 
Spongiform encephalopathies, 
transmissible (TSE), 184, 
336 


Spot 
shape, 352 
identification, 353 
Squeezed gel orientation in linear 
dichroism, 83-4 
Stabilising forces, in proteins, 
200-203 
Stability, 2-6 
Stable isotope tagging, 355, 357 
Stacking, in isotachophoresis, 
151-2, 188 
Staining 
of dot blots, 174 
methods in electrophoresis, see 
Electrophoresis 
selective, 151 
of western blots, 192 
Standard curve 
in electrophoresis, 154 
in protein estimation, 63 
Static quenching, 70 
Stationary phase, 11, 17 
Stereoisomers, 78 
Sterile filtration, 282-3 
Stern-Volmer 
equation, 70 
dynamic constant, 71 
plot, 71 
relationship, 70-1 
Stimulated emission, 100 
Stokes-Einstein equation, 72 
Stokes 
radius, 276 
lines, 91 
Stretched polymer, technique in LD, 
84 
Structural effects, 3 
Structurally conserved regions 
(SCR), 340-1 
Structure 
crystal, 246-8 
factor (Fhgi), 237, 243 
amplitude (| Finx|), 240-1, 243 
in anomalous diffraction, 245, 
249-50 
calculation, 240-1 
refinement, 251-3 
Subcellular fractionation, 272—3 
Sublimation, 286 
Subproteome, 349-50 
Substrates, chromogenic, 64-5, 192 
Succinate dehydrogenase, 272 
Succinic anhydride, 360, 372 


Sucrose, use in solvent perturbation, 
6l 
Sulphenic acid, 372, 374 
Sulphinic acid, 372 
Sulphonic acid, 372 
Sulphonation, 359, 361, 364, 367, 
368 
Sulphopropyl (SP) group, 23 
Sulphydryl group 
alkylation, 136, 138 
counting of, 136, 138, 158-9 
oxidation, 135, 138, 
in proteins, 135, 138, 202 
Superposition, principle of, 53, 87 
Supersaturated region, of phase 
diagram, 230 
SuperStar, 346 
Suppression effect, 119 
Surface area, of proteins, 200 
Surface plasmon, 103, 105 
resonance (SPR), 103, 105-10 
equipment used, 105-7 
imaging, 107 
ligand binding, 106 
uses of, 106-10 
Svedberg, (S), 268 
equation, 268, 276 
Swiss Bioinformatics Institute, 306 
Swiss-PdbViewer, 336, 338, 339, 
342-3 
SWISS-MODEL, 307, 342 
SwissProt, 309, 322 
Symmetry 
centre of, 78—9 
crystallographic, 227 
elements, 227 
internal, 227, 239 
non-crystallographic, 227, 245 
operations, 227 
Synchrotron, sources, 235, 245, 
253-4, 256 
Synthetic peptides, 133-5 
SYPRO orange, 151 
Systeme International d’ Unites (S.I.) 
system, 7-8, 381 


'8Tc, 120 

Tagging methodologies 
in activity-based profiling, 364—6 
in MS, 355 
isotope-coded affinity, 356-8, 360 
N- and C-termini, 358-60 
tandem MS, 359- 
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Tanaka, Koichi, 355 
Tandem MS, see Mass spectrometry 
Target sequence, 336, 341 
TATA box, 310 
TEMED, 150 
Template sequence, 336, 341 
TEMPO, 
2,2,6,6-Tetramethylpiperidinol- 
N-oxyl-, 99 
4-isothiocyanato-, 99-100 
Temperature, 
effects on biomolecules, 3—5 
factor (B), 252 
melting (Tm), 64, 194, 299-300, 
301 
Terminal transferase, 289 
Tetrahydrofuran, 43 
3,5,3’,5'-Tetramethylbenzidine see 
TMBZ 
Tetramethylsilane, 94 
Tetrazolium salt, 153, 155 
Thalidomide, 78—9 
Theoretical plates, 15, 17 
height equivalent to (HETP), 16 
number, 15 
in CE, 183 
Thermal vibration, 254 
Thermocouple, 86 
Thermodynamics, 293-6 
closed system in, 294 
first law, 292 
parameters, 292-5, 300 
protein folding, 202-4 
second law, 294 
Thermophilic bacteria, 5 
Thermostatting, 188 
Thioredoxin, 325-6, 374, 376 
reductase, 374 
Thiol tags, 357-8 
solid phase, 358 
Thorium oxide, 123 
Threonine, 
chirality, 79 
magnetic spin, 216 
N-linkage of sugars, 368-9 
O-linkage of sugars, 368-9 
phosphorylation, 376, 377 
Thrombin, 108 
cleavage site in GST fusion, 36 
Through-pores, see Perfusive pores 
Time-of-flight (TOF) analyzer, 120, 
365-6 
Total internal reflectance, 105 
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Tiselius, Arne, 148-9 
Titration curve, 4 
analysis, 170, 172 
TMBZ, 65 
Toluene, 43 
Torque, 212 
Torsion angles 
of proteins, 215, 252, 342 
restraints in NMR, 217, 218 
Trajectory arc, 123 
Transcript, 362 
Transcriptome, 349-50 
Transformation, of cells, 196 
Transition, 56—9 
electronic, 55 
gel-liquid, 100 
non-electronic, 58 
non-radiative, 64 
radiative, 64, 66 
rotational, 55 
state, 139, 142-3, 204-5, 294 
vibrational, 55, 84-5 
in Raman spectroscopy, 90-91 
Transmitter, 96 
Translation, 227 
function, 244 
TrEMBL, 306, 308, 309 
Trichloroacetic acid (TCA), 170 
Trifluoroacetic acid (TFA), 378 
Triplet state, 65 
Triton X-100, 8 
Trypsinogen, 169 
Tryptic peptides, 357 
Tryptophan, 64 
fluorescence spectrum of, 66 
Amax, 59 
TSEs, see Spongiform 
encephalopathies, 
transmissible (TSE) 
Tswett, 11 
Tubulin, 277-9 
TUNEL (terminal deoxynucleotidyl 
mediated dUTP nick 
end-labelling) assay, see 
Apoptosis 
Tungsten filament, 123 
Turns, 323 
ß-Turns, 80 
Two dimensional SDS PAGE (2-D 
SDS PAGE), 349-53 
basis of, 350-1 
equipment used, 350-1 
Tyrosine, 60 


absorption spectrum of, 60 
fluorescence spectrum of, 66 
Àmax, 59 

nitrosylation, 372 
phosphorylation, 376 


Ubiquitin, 300 

Ultrafiltration, 281-2, 286 

Ultraviolet radiation, 58—64 

Ultraviolet-visible absorbance 
spectroscopy, 58—64 

Unfolding, of proteins, 64 

Uniform resource locator (URL), 9 

Uniprot, 306, 312-3 

Unit cell, 226, 228, 250 

Units, 7—8 

Universal gas constant, 5, 203, 268, 
279, 292 

UNIX, 308, 334 

Unsaturated region, of phase 
diagram, 230 

Uranyl formate, 255 

Urea, 8, 161, 190, 267, 276, 351 

Urease, 169 


Vacuum transfer, 192 
van Deemter, 
curve, 19 
equation, 19 
van der Waals, 
contact distance, 201-2 
interactions, 201, 226, 252, 
263 
radius, 201 
van ’tHoff, 
equation, 72, 300 
relationship, 203-5 
Vancomycin, 297 
Variable number tandem repeat 
(VNTR), 314-5 
Vector 
equilibrium magnetization, 
212 
in Patterson maps, 244 
self, 244 
cross, 244 
Velocity 
gradient, 263—4 
in an electric field, 146 
linear, 264 
parabolic, 264 
Vibrational transition, 55, 84 
Vinblastine, 279 


Vinca alkaloids, 277—80 
Vincristine, 279 
Vinorelbine, 279 
Vinylpyridine, 359 
Visceoelastic relaxation time (t,), of 
DNA, 179, 182 
ratio to pulse time, 179 
Viscometer, 
Couette, 265 
floating rotor, 265 
Ostwald, 264, 267 
Viscosity (n), 263-7 
definition, 263-4 
dependence on solute properties, 
266 
dependence on temperature, 268 
and ESR, 99 
and fluorescence, 71—2 
intrinsic ([7]), 265 
measurement of, 264—7 
relative (7,), 264 
specific (nsp), 265 
Visible radiation, 54-8 
Void volume, 15 
Voltage, 148, 183 
constant, 148 
stopping, 55 
Volume, 
included, 15 
void, 15 
von Willebrand factor, 108 


Walther, Dirk, 336 
Water, 5—6 
desolvation, 200 
bulk, 296 
structural, 200, 251 
Watson-Crick base pairing, 77, 163, 
192 
Wavelength, 55 
-dependent coefficients, 245, 249 
of electromagnetic radiation, 
55-6, 98 
emission, 73 
excitation, 73 
maximum (Amax), 60-1 
X-rays, 234 
Wavenumber, 85 
Wave-particle duality, 55 
Wave 
evanescent, 105 
function, 54-5, 236, 241 
interference, 54, 87-8, 100 


phenomena, 54 
vector, 105 
WebMol, 336-7 
Websites, 306-7 
Western blot, 172, 184, 189-90, 
316, 353, 372-3 
detection by chemiluminescence, 
68 
protein transfer efficiency, 192 
Worldwide web (www), 8-9, 305, 
308 
Wiitrich, Kurt, 212, 218 


Xenon, 123 
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X-gal, 65 
X-ray, 225, 235 
angle of incidence (0), 105, 236, 
238 
beam, 235 
comparison with 
multi-dimensional NMR, 
255-7 
crystallography, 225-54 
diffraction of, 225-6, 235-9 
diffraction patterns, 236, 237 
scattering 
anomalous, 236, 245-7, 
254 
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coherent, 236 
factor (f), 236 
source, 235 
tunable, 253 
wavelength, see Wavelength 


Yeasts, hemiascomycetous, 307 


Z-DNA, 247 

Zone electrophoresis, 149, 
353 

Zwitterion, 4, 152 

Zymogen, 280 

Zymogram, 155-6, 366 
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FIGURE 6.32. Crystals of lysozyme. Depending on the crystallization conditions used a large number of smaller crystals (a) or a small 
number of larger crystals (b) can be obtained. 
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FIGURE 9.11. The Ramachandran diagram. The Ramachandran diagram represents all possible values for œ-C to carbonyl C (Psi) and 
«-C to amide N (Phi) bond angles as —180 to + 180°. The plot shown is for aquaporin 1 (structure as inset) which is exclusively c-helical. 
The region where B-sheet would be found if present is shown. 
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(b) There is 1 sequence with the following architecture: TPR_2, TPR_1, Thioredoxin, Exo_endo_phos, RVT_1 x i 
A2Z003_ORYS) [Oryza sativa subsp. japonica (rice)) putative uncharacterized protein (1722 residues) 
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There is 1 sequence with the following architecture: Thioredoxin, Aminotran_1_2 
Q11FOS MESSB [Mesorhizobium sp. (strain bnc1)] aminotransferase, class i and ï (514 residues) 


There is 1 sequence with the following architecture: Thioredoxin, Trypsin 
A6COI8 OPLAN [Planctomyces maris dsm 8797] probable thioredoxin (591 residues) 
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There is 1 sequence with the following architecture: DUF659, Thioredoxin 
A7PQ14 VITVI [Vitis vinifera (grape)) chromosome chr18 scaffold_24, whole genome shotgun sequence (304 residues) 


E E 


There is 1 sequence with the following architecture: BSD, Thioredoxin 
Q01i1GGO_ OSTTA [Ostreococcus tauri] thioredoxin-like protein (iss) (726 residues) 
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A7PQ14_VITVI 
This is the summary of UniProt entry A7PO14 VITVI? (A7?0147) 


‘Description: Chromosome dw 18 scaffold_24, whole genome shotgun sequence 
Source organism: (NORI taxonomy ID 29760) (? 
Mew 


proteome data. 
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Pfam domains 


This image shows the arrangement of the Pfam domains that we found on this sequence. Clicking on a domain will take you to the page describing that Pfam 
entry. The table below gwes the domain boundaries for each of the domains. Note that some domains may be obscured by other, overlapping domains. This is 
noted in the table where applicable 
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Source Domain Start End 
PfamA QUF559 1 g 
Piem Thioredoxin 248 296 


FIGURE 9.14. The Pfam database. (a) The screen generated when the search-term ‘thioredoxin’ is entered. Searches may also be made by 
sequence. (b) Detail of listing of thioredoxin family members. Note how easy it is to visualize shared domains. (c) Detail of a single entry 
in listing. This specifies exactly where domains begin and end. 
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RasKMol> Select ligand 
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FIGURE 9.24. Viewing structures with RasMol. The PDB file 1J4N for aquaporin 1 was opened as a default wireframe model. The 
commands ‘select ligand’ and ‘spacefill’ were typed into the command line to reveal the location of 3 molecules of (-nonylglucoside with 


which the protein was cocrystallized. 
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FIGURE 9.25. Viewing structures with RasTOP. The file 1J4N was opened and represented in the colour pulldown as ‘structure’ and in 
the ribbon pulldown as ‘trace’ .The command box was used to apply the RasMol commands: ‘select ligand, spacefill, colour white’ to reveal 


B-nonylglucosides. 
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FIGURE 9.26. Viewing structures with Protein Explorer. (a) PDB file 1J4N was opened with Protein Explorer and hide water and toggle 
spinning off were selected. RasMol commands can be typed in the command box on lower left of screen. (b) Selecting explore more allows 
us to modify the image with three pulldowns. Selections are spacefill in Display pulldown and polarity 3 in Colour pulldown. Note that 
hydrophobic residues are lighter-coloured than hydrophobic in this representation. 
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FIGURE 9.27. Viewing structures with PyMOL. The structures of human aquaporin 1 (1J4N), an Escherichia coli nodulin-like intrinsic 


protein (1RC2) and an aquaporin found as a major intrinsic protein in bovine lens fibres (1TM8) are shown in a single window. These are 
all members of the aquaporin family. 
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ESyPred3D - ESyPred3D is a new zutomared homology modeling program The method gets benefit of the increased alignment peformances of a new aligmunent strategy using neurel networks. 
Alignments are cbtamed by combuung, weightng and screemng the results of several multiple alignment programs. The final three dimensional structure is built usng the modeling package MODELLEE 
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Data and parameters input for the ESyPred3D server: 
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FIGURE 9.31. On-line homology modelling with ESyPred3D. (a) The sequence from NCBI of canine aquaporin is entered in ESyPred3D. 
(b) The model returned by e-mail is viewed with PyMol together with the structure which gave greatest sequence alignment (94%), human 


aquaporin 1. The model is represented as a solid cartoon while the superimposed 1J4N is in ribbon. Note that the structures deviate in some 
places while they overlay closely in others. 
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FIGURE 10.5. Difference gel electrophoresis (DIGE). Two samples (test and control) to be compared are labelled separately with distinct 
fluorescent dyes (e.g. green and red). They are then mixed and separated on a single 2-D SDS PAGE gel. Intensity of spots shows if they are 
equal in both samples (yellow), up-regulated (green) or down-regulated (red). 
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FIGURE 10.16. Detecting gene expression with DNA Microarrays. mRNA is isolated from matched test and control cells and converted 
to cDNA by reverse transcriptase. Each cDNA population is tagged with a fluorescent dye (red for test, green for control) and washed over 
the microarray. cDNA hybridises to its complementary gene. Un-hybridized cDNA is removed by washing. Scanning the microarray with 
a laser reveals up-regulated (green), down-regulated (red) and equal gene expression (yellow) at each spot of the microarray. Spots are in 
duplicate and the array contains both positive and negative controls. 


